-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Speed up checking for NaN for floats #25946
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Here's what
I'm run more thorough benchmark now. |
Codecov Report
@@ Coverage Diff @@
## master #25946 +/- ##
==========================================
- Coverage 91.82% 91.81% -0.01%
==========================================
Files 175 175
Lines 52581 52581
==========================================
- Hits 48280 48276 -4
- Misses 4301 4305 +4
Continue to review full report at Codecov.
|
Codecov Report
@@ Coverage Diff @@
## master #25946 +/- ##
==========================================
- Coverage 91.82% 91.81% -0.01%
==========================================
Files 175 175
Lines 52581 52581
==========================================
- Hits 48280 48276 -4
- Misses 4301 4305 +4
Continue to review full report at Codecov.
|
54533f3
to
c7c4c94
Compare
can you run the benchmarks for missing value checking and see what the change is. |
Sure, I am already running those, but it takes huuuuge time to run when you set warmup and sampling times high enough (and with default settings the results are too flaky to be believable). |
right, though just the missing ones shouldn't be that huge |
Can you point out these "missing" benchmark names? |
you can pass a regex to select a subset |
Could you please recommend what benchmark names might be relevant? I didn't study the whole list of them yet... |
So running this
I'm sure this is not benchmarking the worst case, as I think it should speed up parsing a date column where most fields are empty, but only after #25754 is merged so that |
thanks @vnlitvin |
git diff upstream/master -u -- "*.py" | flake8 --diff
This isn't giving much speedup because this is simple change, but for some certain inputs like empty datetime fields in csv it gives some speed (because empty fields are parsed as float
NaN
-s).