-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
PERF: 10x speedup in Series/DataFrame construction for lists of ints #24647
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
nice. ping on green. |
@@ -2011,7 +2011,8 @@ def maybe_convert_objects(ndarray[object] objects, bint try_float=0, | |||
floats[i] = <float64_t>val | |||
complexes[i] = <double complex>val | |||
if not seen.null_: | |||
seen.saw_int(int(val)) | |||
val = int(val) | |||
seen.saw_int(val) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can saw_int's signature be tightened up?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It can, but I didn't see any significant difference when changing it to cdef inline saw_int(self, int val)
, so took that out to keep this minimal. That call is only used in this file, and only a few times at that so it'd be fine to tighten it up.
actually there are a few more cases where the same fix could be applied |
@jreback Just to clarify - the speedup comes from preventing an object-to-object comparison for |
Codecov Report
@@ Coverage Diff @@
## master #24647 +/- ##
==========================================
- Coverage 92.37% 92.37% -0.01%
==========================================
Files 166 166
Lines 52377 52380 +3
==========================================
+ Hits 48385 48387 +2
- Misses 3992 3993 +1
Continue to review full report at Codecov.
|
1 similar comment
Codecov Report
@@ Coverage Diff @@
## master #24647 +/- ##
==========================================
- Coverage 92.37% 92.37% -0.01%
==========================================
Files 166 166
Lines 52377 52380 +3
==========================================
+ Hits 48385 48387 +2
- Misses 3992 3993 +1
Continue to review full report at Codecov.
|
maybe this was the one I was looking at. ok, ping on green. |
@jreback ping |
This PR is a minor tweak to the
int64
/uint64
overflow fix added in #18624Simply casting to an
int
after doing a typecheck is sufficient for the compiler to generate a 10x speedup:This is how
maybe_convert_numeric()
already handlesint
s, so this just bringsmaybe_convert_object()
back into alignment.I believe this would yield a similar speedup for
DataFrame
s but we don't have any benchmarks explicitly testing as such. However, theget_dummies()
benchmark involves expanding to aDataFrame
and gets a speedup of similar magnitude (not visible as it previously would time out after 30s).git diff upstream/master -u -- "*.py" | flake8 --diff