-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
CLN: reorg groupby to multiple modules #21820
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
if any comments. This has no user facing changes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are still a couple of open groupby doc PRs (that we have been merging the last days). Merging this will make updating those harder (but merging those PRs first also give a possibly painful merge here)
I would personally leave frame and series stuff together with NDFrameGroupby, as those are quite related (that still makes that the file is splitted, so still an improvement)
pandas/core/panel.py
Outdated
@@ -911,7 +911,7 @@ def groupby(self, function, axis='major'): | |||
------- | |||
grouped : PanelGroupBy | |||
""" | |||
from pandas.core.groupby.groupby import PanelGroupBy | |||
from pandas.core.groupby.panel import PanelGroupBy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would make this importing just from pandas.core.groupby
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
pandas/core/resample.py
Outdated
) | ||
from pandas.core.groupby.series import SeriesGroupBy | ||
from pandas.core.groupby.panel import PanelGroupBy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here, I would import only from pandas.core.groupby
, and expose there everything that is needed elsewhere in pandas. That way we don't always have to change this if the groupby module is internally refactored.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And doing that makes it also clear what is used elsewhere in pandas, as this is what is in the groupby/__init__.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this not following the general pattern. It is trivial to update this if imports are changed and IMHO it makes the structure much more clear here.
pandas/core/groupby/ops.py
Outdated
@@ -0,0 +1,1528 @@ | |||
""" | |||
perform the groupby operations. These are not exposed | |||
to the top-level interace in groupby.py |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Grouper is exposed and used in resample code? So that doesn't seem fully true?
pandas/core/groupby/ops.py
Outdated
@@ -0,0 +1,1528 @@ | |||
""" | |||
perform the groupby operations. These are not exposed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you expand a bit on "perform the groupby operations". That's not directly clear to me from reading that. (it is the "split" (finding the groups) part of "split-apply-combine ?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems logical
pandas/core/groupby/frame.py
Outdated
from pandas.core.groupby.groupby import NDFrameGroupBy | ||
from pandas.core.groupby.series import SeriesGroupBy | ||
from pandas._libs.lib import count_level_2d | ||
from pandas.plotting._core import boxplot_frame_groupby |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since stuff is being moved en-masse, this is a good time to clean up imports. After the standard __future__
-->stdlib-->3rd party blocks, I advocate blocks in order of dependencies. So imports from _libs
, then util
and compat
, then core
(I like to put core.dtypes
before other core
), and last plotting
. YMMV.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
Codecov Report
@@ Coverage Diff @@
## master #21820 +/- ##
==========================================
+ Coverage 91.9% 91.91% +0.01%
==========================================
Files 160 164 +4
Lines 49892 49956 +64
==========================================
+ Hits 45852 45916 +64
Misses 4040 4040
Continue to review full report at Codecov.
|
better to do file splitting sooner rather than later |
ok this is looking pretty good now. split logically. if any more pass. want to merge this soonish to avoid conflicts. |
pandas/core/groupby/ops.py
Outdated
@@ -0,0 +1,916 @@ | |||
""" | |||
Provide classe to perform the groupby aggregate operations. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo classe
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
if any other comments @pandas-dev/pandas-core pretty straightforward split. |
merging tomorrow if any comments. |
makes the impl of groupby a bit easier to grok.
cc @WillAyd @TomAugspurger @jorisvandenbossche