Skip to content

CLN: reorg groupby to multiple modules #21820

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Jul 11, 2018
Merged

Conversation

jreback
Copy link
Contributor

@jreback jreback commented Jul 8, 2018

makes the impl of groupby a bit easier to grok.

cc @WillAyd @TomAugspurger @jorisvandenbossche

@jreback jreback added Refactor Internal refactoring of code Groupby labels Jul 8, 2018
@jreback jreback added this to the 0.24.0 milestone Jul 8, 2018
@jreback
Copy link
Contributor Author

jreback commented Jul 8, 2018

if any comments. This has no user facing changes.

Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are still a couple of open groupby doc PRs (that we have been merging the last days). Merging this will make updating those harder (but merging those PRs first also give a possibly painful merge here)

I would personally leave frame and series stuff together with NDFrameGroupby, as those are quite related (that still makes that the file is splitted, so still an improvement)

@@ -911,7 +911,7 @@ def groupby(self, function, axis='major'):
-------
grouped : PanelGroupBy
"""
from pandas.core.groupby.groupby import PanelGroupBy
from pandas.core.groupby.panel import PanelGroupBy
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would make this importing just from pandas.core.groupby

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

)
from pandas.core.groupby.series import SeriesGroupBy
from pandas.core.groupby.panel import PanelGroupBy
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, I would import only from pandas.core.groupby, and expose there everything that is needed elsewhere in pandas. That way we don't always have to change this if the groupby module is internally refactored.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And doing that makes it also clear what is used elsewhere in pandas, as this is what is in the groupby/__init__.py

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this not following the general pattern. It is trivial to update this if imports are changed and IMHO it makes the structure much more clear here.

@@ -0,0 +1,1528 @@
"""
perform the groupby operations. These are not exposed
to the top-level interace in groupby.py
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Grouper is exposed and used in resample code? So that doesn't seem fully true?

@@ -0,0 +1,1528 @@
"""
perform the groupby operations. These are not exposed
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you expand a bit on "perform the groupby operations". That's not directly clear to me from reading that. (it is the "split" (finding the groups) part of "split-apply-combine ?)

Copy link
Member

@WillAyd WillAyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems logical

from pandas.core.groupby.groupby import NDFrameGroupBy
from pandas.core.groupby.series import SeriesGroupBy
from pandas._libs.lib import count_level_2d
from pandas.plotting._core import boxplot_frame_groupby
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since stuff is being moved en-masse, this is a good time to clean up imports. After the standard __future__-->stdlib-->3rd party blocks, I advocate blocks in order of dependencies. So imports from _libs, then util and compat, then core (I like to put core.dtypes before other core), and last plotting. YMMV.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

@codecov
Copy link

codecov bot commented Jul 9, 2018

Codecov Report

Merging #21820 into master will increase coverage by 0.01%.
The diff coverage is 91.68%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #21820      +/-   ##
==========================================
+ Coverage    91.9%   91.91%   +0.01%     
==========================================
  Files         160      164       +4     
  Lines       49892    49956      +64     
==========================================
+ Hits        45852    45916      +64     
  Misses       4040     4040
Flag Coverage Δ
#multiple 90.29% <91.68%> (+0.01%) ⬆️
#single 42.16% <18.22%> (+0.06%) ⬆️
Impacted Files Coverage Δ
pandas/core/base.py 96.69% <ø> (-0.15%) ⬇️
pandas/core/groupby/groupby.py 96.13% <ø> (+3.44%) ⬆️
pandas/core/resample.py 96.09% <100%> (+0.02%) ⬆️
pandas/core/reshape/pivot.py 97.03% <100%> (ø) ⬆️
pandas/core/groupby/__init__.py 100% <100%> (ø) ⬆️
pandas/core/api.py 100% <100%> (ø) ⬆️
pandas/core/window.py 96.27% <100%> (ø) ⬆️
pandas/core/panel.py 97.44% <100%> (ø) ⬆️
pandas/core/groupby/generic.py 86.79% <86.79%> (ø)
pandas/core/groupby/base.py 91.11% <91.11%> (ø)
... and 7 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5380fcd...497d18b. Read the comment docs.

@jreback
Copy link
Contributor Author

jreback commented Jul 9, 2018

There are still a couple of open groupby doc PRs (that we have been merging the last days). Merging this will make updating those harder (but merging those PRs first also give a possibly painful merge here)

better to do file splitting sooner rather than later

@jreback
Copy link
Contributor Author

jreback commented Jul 9, 2018

ok this is looking pretty good now. split logically. if any more pass. want to merge this soonish to avoid conflicts.

@@ -0,0 +1,916 @@
"""
Provide classe to perform the groupby aggregate operations.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo classe

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@jreback
Copy link
Contributor Author

jreback commented Jul 9, 2018

if any other comments

@pandas-dev/pandas-core pretty straightforward split.

@jreback
Copy link
Contributor Author

jreback commented Jul 11, 2018

merging tomorrow if any comments.

@jreback jreback merged commit 31fb39a into pandas-dev:master Jul 11, 2018
Sup3rGeo pushed a commit to Sup3rGeo/pandas that referenced this pull request Oct 1, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Groupby Refactor Internal refactoring of code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants