-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
PERF: fix getitem unique_check / initialization issue #14933
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@ssanderson no I think was missing a The original reason for this (the lazy construction), should be preserved. (I wish we actually had a test for this :<) The The |
@@ -269,7 +261,7 @@ cdef class IndexEngine: | |||
cdef inline _ensure_mapping_populated(self): | |||
# need to reset if we have previously | |||
# set the initialized from monotonic checks | |||
if self.unique_check: | |||
if self.need_unique_check or self.mapping is None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, if I understand correctly, previously we were dispatching on self.unique_check
here to determine whether to unset the self.initialized
flag here before determining whether to create a populate a hash table. The reason we need to compare against self.unique_check
is that, if that flag is set, we might have set self.initialized = 1 even though we don't have a hash table, because we set self.initialized = 1
in the case that we were able to successfully determine uniqueness from the monotonic check.
Did I get that right? If so, would a simpler fix be to just drop the initialized
flag entirely and use self.mapping is None
everywhere it's currently used? I think that could be done if we also update the is_unique
property to gate on the need_unique_check
flag instead of the initialized
flag, which seems generally clearer anyway.
Yeah, I realized we'd actually flipped the polarity on a second read. |
Some day I will find the time to finish my branch that was working on this code :/. Maybe I'll have a free night or two over the holidays. |
Current coverage is 84.64% (diff: 100%)@@ master #14933 diff @@
==========================================
Files 144 144
Lines 51025 51025
Methods 0 0
Messages 0 0
Branches 0 0
==========================================
Hits 43191 43191
Misses 7834 7834
Partials 0 0
|
@ssanderson I simplified |
if not self.initialized: | ||
# populate the mappings | ||
|
||
if self.need_unique_check or not self.initialized: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we still need to check against need_unique_check
here, or is self.initialized
sufficient now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed, only need a single routine now.
|
||
self.mapping = self._make_hash_table(len(values)) | ||
self.mapping.map_locations(values) | ||
if not self.initialized: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we ever expect to initialize
to be called if we're already initialized? If not, should we error and/or log a warning if we hit this branch?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this makes it idempotent. and actually simpler logic.
👍 from me if tests are passing. |
closes pandas-dev#14930 Author: Jeff Reback <[email protected]> Closes pandas-dev#14933 from jreback/perf and squashes the following commits: dc32b39 [Jeff Reback] PERF: fix getitem unique_check / initialization issue (cherry picked from commit 07c83ee)
closes pandas-dev#14930 Author: Jeff Reback <[email protected]> Closes pandas-dev#14933 from jreback/perf and squashes the following commits: dc32b39 [Jeff Reback] PERF: fix getitem unique_check / initialization issue
closes #14930