-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
BUG: ngroups and len(groups) do not equal when grouping with a list of Grouper and column label (GH26326) #26374
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…f Grouper and column label (GH26326)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR! Most comments are stylistic though I think the implementation needs to be more generalizable as well
pandas/core/groupby/ops.py
Outdated
to_groupby = zip(*(ping.grouper for ping in self.groupings)) | ||
to_groupby = zip(*(ping.grouper if not isinstance(ping.grouper, | ||
self.__class__) | ||
else ping.grouper.groupings[0].grouper for ping |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't this only work if the grouper was the first item? Would need something more generalizable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was wondering the same. Can Groupers be nested to more than one level?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this would fail if your test was .groupby(['beta', pd.Grouper(level='alpha')])
instead of .groupby([pd.Grouper(level='alpha'), 'beta'])
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem is with the construction of the Groupings
. The [0]
indexes the sole BaseGrouper
created inside the grouping due to the _get_grouper
method. The order will be maintained. I have added another assertion to check this and it passed.
pandas/tests/groupby/test_groupby.py
Outdated
|
||
|
||
def test_groupby_groups_in_BaseGrouper(): | ||
# https://github.com/pandas-dev/pandas/issues/26326 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# https://github.com/pandas-dev/pandas/issues/26326 | |
# GH 26326 |
pandas/tests/groupby/test_groupby.py
Outdated
|
||
def test_groupby_groups_in_BaseGrouper(): | ||
# https://github.com/pandas-dev/pandas/issues/26326 | ||
m_index = pd.MultiIndex.from_product([['A', 'B'], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use mi
instead of m_index
pandas/tests/groupby/test_groupby.py
Outdated
# https://github.com/pandas-dev/pandas/issues/26326 | ||
m_index = pd.MultiIndex.from_product([['A', 'B'], | ||
['C', 'D']], names=['alpha', 'beta']) | ||
df_sample = pd.DataFrame({'foo': [1, 2, 1, 2], 'bar': [1, 2, 3, 4]}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
df
instead of df_sample
pandas/tests/groupby/test_groupby.py
Outdated
['C', 'D']], names=['alpha', 'beta']) | ||
df_sample = pd.DataFrame({'foo': [1, 2, 1, 2], 'bar': [1, 2, 3, 4]}, | ||
index=m_index) | ||
dfGBY_BaseGrouper = df_sample.groupby([pd.Grouper(level='alpha'), 'beta']) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe just grp1
and grp2
for better readability here and line below
doc/source/whatsnew/v0.25.0.rst
Outdated
@@ -406,7 +406,7 @@ Groupby/Resample/Rolling | |||
- Bug in :meth:`pandas.core.groupby.GroupBy.idxmax` and :meth:`pandas.core.groupby.GroupBy.idxmin` with datetime column would return incorrect dtype (:issue:`25444`, :issue:`15306`) | |||
- Bug in :meth:`pandas.core.groupby.GroupBy.cumsum`, :meth:`pandas.core.groupby.GroupBy.cumprod`, :meth:`pandas.core.groupby.GroupBy.cummin` and :meth:`pandas.core.groupby.GroupBy.cummax` with categorical column having absent categories, would return incorrect result or segfault (:issue:`16771`) | |||
- Bug in :meth:`pandas.core.groupby.GroupBy.nth` where NA values in the grouping would return incorrect results (:issue:`26011`) | |||
|
|||
- Bug in :meth:`pandas.core.groupby.ops.BaseGrouper.groups` in which a :class:`BaseGrouper` object with another :class:`BaseGrouper` as part of its :class:`Groupings` would return incorrect set of groups (:issue:`26326`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you construct a message that is more user facing? These items aren't part of the API
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
User facing as in more user friendly?
Codecov Report
@@ Coverage Diff @@
## master #26374 +/- ##
==========================================
- Coverage 91.68% 41.19% -50.5%
==========================================
Files 174 174
Lines 50700 50700
==========================================
- Hits 46486 20887 -25599
- Misses 4214 29813 +25599
Continue to review full report at Codecov.
|
Codecov Report
@@ Coverage Diff @@
## master #26374 +/- ##
==========================================
+ Coverage 91.73% 91.74% +<.01%
==========================================
Files 174 174
Lines 50754 50752 -2
==========================================
+ Hits 46560 46561 +1
+ Misses 4194 4191 -3
Continue to review full report at Codecov.
|
Added the changes you requested in the last commit |
It would appear that this method would not work when subclasses of |
Hello @shantanu-gontia! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found: There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻 Comment last updated at 2019-05-19 22:20:05 UTC |
doc/source/whatsnew/v0.25.0.rst
Outdated
@@ -408,7 +408,7 @@ Groupby/Resample/Rolling | |||
- Bug in :meth:`pandas.core.groupby.GroupBy.idxmax` and :meth:`pandas.core.groupby.GroupBy.idxmin` with datetime column would return incorrect dtype (:issue:`25444`, :issue:`15306`) | |||
- Bug in :meth:`pandas.core.groupby.GroupBy.cumsum`, :meth:`pandas.core.groupby.GroupBy.cumprod`, :meth:`pandas.core.groupby.GroupBy.cummin` and :meth:`pandas.core.groupby.GroupBy.cummax` with categorical column having absent categories, would return incorrect result or segfault (:issue:`16771`) | |||
- Bug in :meth:`pandas.core.groupby.GroupBy.nth` where NA values in the grouping would return incorrect results (:issue:`26011`) | |||
|
|||
- Bug in :meth:`pandas.core.groupby.ops.BaseGrouper.groups` in which creating a :class:`GroupBy` object with a key of type :class:`Grouper` would result in producing incorrect groups (:issue:`26326`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is not user facing, can you reword a bit
pandas/core/groupby/ops.py
Outdated
@@ -258,6 +267,8 @@ def groups(self): | |||
if len(self.groupings) == 1: | |||
return self.groupings[0].groups | |||
else: | |||
def is_basegrouper(self, obj): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is not needed
pandas/tests/groupby/test_groupby.py
Outdated
['C', 'D']], names=['alpha', 'beta']) | ||
df = pd.DataFrame({'foo': [1, 2, 1, 2], 'bar': [1, 2, 3, 4]}, | ||
index=mi) | ||
grp1 = df.groupby([pd.Grouper(level='alpha'), 'beta']) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you call this
result =
and the nbggrp1 -> expected
then do then consecutively, e.g. like
result = df.groupby(....)
expected = ....
assert result.groups == expected.groups
result = df......
expected = ....
assert
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure, looks cleaner that way.
|
||
|
||
def test_groupby_groups_in_BaseGrouper(): | ||
# GH 26326 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you give a 1-line description of what this is testing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure. will do so in the next commit
doc/source/whatsnew/v0.25.0.rst
Outdated
@@ -444,9 +444,9 @@ Groupby/Resample/Rolling | |||
- Bug in :meth:`pandas.core.groupby.GroupBy.cumsum`, :meth:`pandas.core.groupby.GroupBy.cumprod`, :meth:`pandas.core.groupby.GroupBy.cummin` and :meth:`pandas.core.groupby.GroupBy.cummax` with categorical column having absent categories, would return incorrect result or segfault (:issue:`16771`) | |||
- Bug in :meth:`pandas.core.groupby.GroupBy.nth` where NA values in the grouping would return incorrect results (:issue:`26011`) | |||
- Bug in :meth:`pandas.core.groupby.SeriesGroupBy.transform` where transforming an empty group would raise error (:issue:`26208`) | |||
- Bug in :meth:`pandas.core.frame.DataFrame.groupby` where passing a :class:`pandas.core.groupby.grouper.Grouper` would return incorrect groups (:issue:`26326`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
incorrect groups -> incorrect groups when using .groups
accessor
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will do so in the next commit.
thanks @shantanu-gontia |
git diff upstream/master -u -- "*.py" | flake8 --diff