-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Add missing __finalize__ calls in indexers/iterators #46101
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add missing __finalize__ calls in indexers/iterators #46101
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This broke existing tests
Should be good now. I had to un-xfail some more tests that were previously marked as not implemented. |
@phofl All tests are passing now. Can you review again, please? |
looks pretty good @theOehrly if you can add a release note (I think we have a few already that are about finalize so if you can group with those). |
cc @phofl if you can review when you have a chance |
@jreback That one failure seems unrelated? At least I'm failing to make sense of it, and it did pass previously. |
Looks unrelated, yes |
doc/source/whatsnew/v1.4.2.rst
Outdated
@@ -23,6 +23,7 @@ Fixed regressions | |||
|
|||
Bug fixes | |||
~~~~~~~~~ | |||
- Fixed metadata propagation in :meth:`NDFrame.xs`, :meth:`DataFrame.iterrows`, :meth:`DataFrame.loc` and :meth:`DataFrame.iloc` (:issue:`28283`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In case of loc and iloc:
Is this true for all cases? This has lots of different branches
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes it is*, and sorry for the delayed response. I wanted to validate this again, more extensively. I've made some slight changes to the code, for testing purposes only, so that every single call to loc and iloc adds attrs to the original object and validates that they still exist on the returned object. After that, I ran the entire test suite with these additional checks built in. Assuming that this gives 100% coverage of all branches of loc and iloc, there were only three failures, all of which were related to _get_item_cache.
Basically, if attrs/_metadata/flags change after an object was cached, the next time a sliced object is returned using _get_item_cache, it will have the old attrs/_metadata/flags. I'm not sure if we'd want to add a finalize call to _get_item_cache though? If attrs/_metadata/flags are considered constants for each object, then this might not be a problem. This only came up because my extended testing modified attrs between multiple loc/iloc calls, but was never caused by a test itself.
This pull request is stale because it has been open for thirty days with no activity. Please update and respond to this comment if you're still interested in working on this. |
you can merge master and update to any comments |
I was waiting for a response to my comments on the review above. I guess there still are open questions that should be addressed. |
can look but pls merge master |
e605849
to
e2645bc
Compare
@jreback main is merged and tests are fixed again |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice to run the performance benchmarks for indexing before/after for this change.
doc/source/whatsnew/v1.4.3.rst
Outdated
@@ -23,7 +23,7 @@ Fixed regressions | |||
|
|||
Bug fixes | |||
~~~~~~~~~ | |||
- | |||
- Fixed metadata propagation in :meth:`NDFrame.xs`, :meth:`DataFrame.iterrows`, :meth:`DataFrame.loc` and :meth:`DataFrame.iloc` (:issue:`28283`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pls move to 1.5
@jreback Results of the performance benchmark for indexing (only tests that got worse):
Or the full output here: https://pastebin.com/raw/3sVMufVc |
@jreback any more feedback on this? |
thanks @theOehrly this is very nice! |
doc/source/whatsnew/vX.X.X.rst
file if fixing a bug or adding a new feature.This PR intends to add missing
__finalize__
calls to indexers and locators (this section of the docs: https://pandas.pydata.org/pandas-docs/stable/reference/frame.html#indexing-iteration). I went over all the methods listed in that section and in my opinion, everything should be covered after this PR.This is generally in reference to #28283, although the methods covered here are not listed there. #19850 is also somewhat relevant.
The missing calls are only related to cases where a Series object was returned from indexing/iterating over a DataFrame. Though, I was somewhat generous with adding new test cases. Some newly added tests cases would have already passed before adding these calls, but I thought that it was still worth adding them.