-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
ENH: Improve ref-tracking for group keys #51442
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: Improve ref-tracking for group keys #51442
Conversation
This would make Still - I'm +0 here; checking equality gives some odd behaviors as well. |
Yes those 2 behave different, but this pr establishes the same behavior as in the non-cow mode for this case. So that's actually a plus imo |
pandas/core/groupby/grouper.py
Outdated
except (AttributeError, KeyError, IndexError, InvalidIndexError): | ||
return False | ||
if isinstance(gpr, Series) and isinstance(obj_gpr_column, Series): | ||
ref = weakref.ref(obj_gpr_column._mgr.blocks[0]) | ||
return ref in gpr._mgr.blocks[0].refs.referenced_blocks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So if you create multiple times a weakref to the same object, this is always the same (i.e. identical) weakref object?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, that's how it is documented (https://docs.python.org/3/library/weakref.html#weakref.ref). Did some basic checks with out blocks and it works as I would have expected
Yes, I think this is a good solution for now (essentially the same as what was suggested in #50730). And it indeed aligns with the behaviour we have for non-CoW. Longer term, I agree with @rhshadrach that the current semantics are not really great. We should probably at some point move to not doing such check at all, at which point this discussion about identity vs equality also just doesn't matter anymore? |
Yep this sounds sensible, but I think this needs a deprecation cycle? I’ll move the check to the BlockManager then |
Implemented it on the manager level |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just two small comments to clean-up
pandas/core/groupby/grouper.py
Outdated
@@ -947,9 +947,19 @@ def is_in_obj(gpr) -> bool: | |||
# For the CoW case, we need an equality check as the identity check |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you update this comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
pandas/core/groupby/grouper.py
Outdated
@@ -947,9 +947,19 @@ def is_in_obj(gpr) -> bool: | |||
# For the CoW case, we need an equality check as the identity check | |||
# no longer works (each Series from column access is a new object) | |||
try: | |||
return gpr.equals(obj[gpr.name]) | |||
obj_gpr_column = obj[gpr.name] | |||
except (AttributeError, KeyError, IndexError, InvalidIndexError): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can remove the AttributeError now, I think (.name
is already checked above)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct, thx
…cking # Conflicts: # pandas/core/internals/managers.py
Thanks! |
doc/source/whatsnew/vX.X.X.rst
file if fixing a bug or adding a new feature.cc @jorisvandenbossche
Follow-up on #49450
The equals check causes failures in Dask. I think we are better off moving to reference checking anyway. If you are ok with the general logic here, I'd refactor this check down to the Manager level.