-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BENCH/REF: parametrize CSV benchmarks on engine #38442
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Atm not sure what's going on with the benchmark failure - the failure is in |
#38476 should fix the unrelated benchmark failure. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good small question
asv_bench/benchmarks/io/csv.py
Outdated
@@ -146,10 +146,10 @@ def time_read_csv(self, bad_date_value): | |||
class ReadCSVSkipRows(BaseIO): | |||
|
|||
fname = "__test__.csv" | |||
params = [None, 10000] | |||
param_names = ["skiprows"] | |||
params = ([None, 10000], ["c"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this supposed to be c and python or just c?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On current master we only have the default (I think that's c
)
We have c
and python
in all the other benchmarks so it seems reasonable to add that here. If there's a reason we don't want that I'll revert
Thanks @mroeschke!! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
Is there some logic in which classes are parameterized and which not?
|
||
|
||
class ReadCSVCategorical(BaseIO): | ||
fname = "__test__.csv" | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could also parametrize this one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
Not that I know of beyond (AFAIK) that the python engine supports all these whereas the c/pyarrow engines only support a proper subset. I'll do a pass through the entire file and parametrize where both c and python engines are supported. |
CI green |
thanks @arw2019 |
black pandas
git diff upstream/master -u -- "*.py" | flake8 --diff
The diff in basic PR implementing the
pyarrow
-based CSV engine (#38370) is quite big. A part of that PR is a small refactor of the CSV I/O benchmarks such that they takeengine
as a parameter. Most of that refactor is not dependent on having the pyarrow engine so I'm spinning it off here to de-bloat #38370.