-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: Preserve Series/DataFrame subclasses through groupby operations #33884
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
jreback
merged 38 commits into
pandas-dev:master
from
JBGreisman:groupby-preserve-subclass
May 13, 2020
Merged
Changes from 37 commits
Commits
Show all changes
38 commits
Select commit
Hold shift + click to select a range
7da0703
Add tests to confirm groupby ops preserve subclasses
JBGreisman f6e1a89
Update SeriesGroupBy constructor calls to preserve subclassed Series
JBGreisman ee6de43
Fix concat() method to preserve subclassed DataFrames
JBGreisman 3f9f4c4
Add GH28330 comment to concat.py
JBGreisman 0112826
Preserve subclassing in DataFrame.idxmin() and DataFrame.idxmax() calls
JBGreisman 422e702
GH28330 Fix GroupBy.size() to preserve subclassed types
JBGreisman 53ac397
GH28330 Fix GroupBy.ngroup() to preserve subclassed types
JBGreisman 2bc2520
GH28330 Fix GroupBy.cumcount() to preserve subclassed types
JBGreisman 8d9a885
GH28330 Fix constructor calls to preserve subclasses through groupby()
JBGreisman e4d7fa8
Fix typo -- Series.constructor() to Series._constructor()
JBGreisman 1dbe986
Remove DeprecationWarning due to empty Series construction
JBGreisman c998422
BUG: GH28330 Preserve subclassing with groupby operations
JBGreisman abdb861
BUG: GH28330 Preserve subclassing with groupby operations
JBGreisman d36ad6d
Merge remote-tracking branch 'upstream/master' into groupby-preserve-…
JBGreisman 5b83062
Fix formatting of .py files with black
JBGreisman b6ea731
Removed trailing whitespace in doc/source/whatsnew/v1.1.0.rst
JBGreisman 0cdf0ea
Update DataFrameGroupBy._cython_agg_blocks() to pass mypy
JBGreisman 6e48e07
Merge remote-tracking branch 'upstream/master' into groupby-preserve-…
JBGreisman a70c21a
Remove unused import of typing.cast from pandas/core/groupby/generic.py
JBGreisman c03d459
Move tests to test_groupby_subclass.py
JBGreisman 9e42c79
Add tests for DataFrame.idxmin() and DataFrame.idxmax() with subclasses
JBGreisman 9fbc645
Add test to confirm concat() preserves subclassed types
JBGreisman 5750d72
Update whatsnew entry bugfix
JBGreisman 7f4c5a7
Merge remote-tracking branch 'upstream/master' into groupby-preserve-…
JBGreisman 8eee73c
Revert unnecessary changes in GroupBy()
JBGreisman 4b304c1
Fix test to expect Series from GroupBy.ngroup() and GroupBy.cumcount()
JBGreisman b3e039a
Fix formatting of groupby.py
JBGreisman b1118de
Merge remote-tracking branch 'upstream/master' into groupby-preserve-…
JBGreisman a490e38
Avoid DeprecationWarning by checking for instance of Series
JBGreisman 5bcf9fa
Merge remote-tracking branch 'upstream/master' into groupby-preserve-…
JBGreisman 0244b36
Remove unnecessary constructor call in DataFrameGroupBy._cython_agg_b…
JBGreisman dcd4692
Fix mypy static typing issue in DataFrameGroupBy._cython_agg_blocks()
JBGreisman a92c51b
Ensure consistent return types for GroupBy.size(), ngroup(), and cumc…
JBGreisman cf3b978
Revert DataFrameGroupBy._cython_agg_blocks() back to origin/master
JBGreisman f08cf59
Add GroupBy._constructor() to facilitate preserving subclassed types
JBGreisman d2a7de2
Change DataFrameGroupBy._transform_fast() to use _constructor property
JBGreisman 37ea97f
Restructure GroupBy._constructor() to remove else statement
JBGreisman f1570da
Rename GroupBy._constructor property to GroupBy._obj_1d_constructor
JBGreisman File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm this should be FrameOrSeries I think.
why is there an else here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had thought this should be a Series because if
self.obj
is a DataFrame, it returnsself.obj._constructor_sliced
. Do you think I should name this property_series_constructor
or something comparable to make that behavior more apparent?The else was there because
mypy
was complaining about a missing return statement. I can restructure this with an assertion to avoid an else statement and keepmypy
from complaining:Please let me know if you have a better way to structure this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes pls use an assertion
maybe @simonjayhawkins or @WillAyd can help with the annotation itself
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks -- I changed
_constructor
to use the assertion as above. I think it would also make sense to change the name of the property to_series_constructor
in order to clarify the return type, but I'll hold off for additional comments.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the annotation is correct given the way it is implemented, though I am not sure about the implementation. Why do we need to dispatch to
constructor_sliced
for DataFrames? Seems slightly unnatural to have to force thatThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This ends up being used in
GroupBy.ngroup()
,GroupBy.cumcount()
, andGroupBy.size()
. Prior to these changes, these methods had returnedSeries
, regardless of whether they were called from aSeriesGroupBy
orDataFrameGroupBy
object. I had updated this to still return aSeries
-type while preserving the subclasses -- to avoid things reverting back topd.Series
if they were called from a subclassed DataFrame/Series.As such, the motivation for dispatching to
constructor_sliced
for DataFrames was to avoid changing the return type for these differentGroupBy
methods from their prior behavior. Do you think it would make sense to make a larger change here that alters these functions to have different return types if called fromSeriesGroupBy
vs.DataFrameGroupBy
?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm that's unfortunate... Can you rename this property to
_obj_constructor
instead? I find the current name a little confusingThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or maybe even
_obj_1d_constructor
to be even more explicit. This is definitely for special cases so want to signal as suchThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah -- I think
_obj_1d_constructor
is most clear. I'll make the changes.