BUG: ngroups and len(groups) do not equal when grouping with a list of Grouper and column label (GH26326) #26374

shantanu-gontia · 2019-05-13T16:54:35Z

closes BUG: ngroups and len(groups) do not equal when grouping with a list of Grouper and column label #26326
tests added / passed
passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry

…f Grouper and column label (GH26326)

WillAyd

Thanks for the PR! Most comments are stylistic though I think the implementation needs to be more generalizable as well

WillAyd · 2019-05-13T17:22:48Z

pandas/core/groupby/ops.py

-            to_groupby = zip(*(ping.grouper for ping in self.groupings))
+            to_groupby = zip(*(ping.grouper if not isinstance(ping.grouper,
+                                                              self.__class__)
+                             else ping.grouper.groupings[0].grouper for ping


Wouldn't this only work if the grouper was the first item? Would need something more generalizable

I was wondering the same. Can Groupers be nested to more than one level?

I think this would fail if your test was .groupby(['beta', pd.Grouper(level='alpha')]) instead of .groupby([pd.Grouper(level='alpha'), 'beta'])

The problem is with the construction of the Groupings. The [0] indexes the sole BaseGrouper created inside the grouping due to the _get_grouper method. The order will be maintained. I have added another assertion to check this and it passed.

WillAyd · 2019-05-13T17:24:01Z

pandas/tests/groupby/test_groupby.py

+
+
+def test_groupby_groups_in_BaseGrouper():
+    # https://github.com/pandas-dev/pandas/issues/26326


Suggested change

# https://github.com/pandas-dev/pandas/issues/26326

# GH 26326

WillAyd · 2019-05-13T17:24:14Z

pandas/tests/groupby/test_groupby.py

+
+def test_groupby_groups_in_BaseGrouper():
+    # https://github.com/pandas-dev/pandas/issues/26326
+    m_index = pd.MultiIndex.from_product([['A', 'B'],


Use mi instead of m_index

WillAyd · 2019-05-13T17:24:26Z

pandas/tests/groupby/test_groupby.py

+    # https://github.com/pandas-dev/pandas/issues/26326
+    m_index = pd.MultiIndex.from_product([['A', 'B'],
+                                         ['C', 'D']], names=['alpha', 'beta'])
+    df_sample = pd.DataFrame({'foo': [1, 2, 1, 2], 'bar': [1, 2, 3, 4]},


df instead of df_sample

pandas/tests/groupby/test_groupby.py

WillAyd · 2019-05-13T17:25:32Z

pandas/tests/groupby/test_groupby.py

+                                         ['C', 'D']], names=['alpha', 'beta'])
+    df_sample = pd.DataFrame({'foo': [1, 2, 1, 2], 'bar': [1, 2, 3, 4]},
+                             index=m_index)
+    dfGBY_BaseGrouper = df_sample.groupby([pd.Grouper(level='alpha'), 'beta'])


Maybe just grp1 and grp2 for better readability here and line below

WillAyd · 2019-05-13T17:26:14Z

doc/source/whatsnew/v0.25.0.rst

@@ -406,7 +406,7 @@ Groupby/Resample/Rolling
 - Bug in :meth:`pandas.core.groupby.GroupBy.idxmax` and :meth:`pandas.core.groupby.GroupBy.idxmin` with datetime column would return incorrect dtype (:issue:`25444`, :issue:`15306`)
 - Bug in :meth:`pandas.core.groupby.GroupBy.cumsum`, :meth:`pandas.core.groupby.GroupBy.cumprod`, :meth:`pandas.core.groupby.GroupBy.cummin` and :meth:`pandas.core.groupby.GroupBy.cummax` with categorical column having absent categories, would return incorrect result or segfault (:issue:`16771`)
 - Bug in :meth:`pandas.core.groupby.GroupBy.nth` where NA values in the grouping would return incorrect results (:issue:`26011`)
-
+- Bug in :meth:`pandas.core.groupby.ops.BaseGrouper.groups` in which a :class:`BaseGrouper` object with another :class:`BaseGrouper` as part of its :class:`Groupings` would return incorrect set of groups (:issue:`26326`)


Can you construct a message that is more user facing? These items aren't part of the API

User facing as in more user friendly?

codecov · 2019-05-13T17:47:23Z

Codecov Report

Merging #26374 into master will decrease coverage by 50.49%.
The diff coverage is 0%.

@@            Coverage Diff             @@
##           master   #26374      +/-   ##
==========================================
- Coverage   91.68%   41.19%   -50.5%     
==========================================
  Files         174      174              
  Lines       50700    50700              
==========================================
- Hits        46486    20887   -25599     
- Misses       4214    29813   +25599

Flag	Coverage Δ
#multiple	`?`
#single	`41.19% <0%> (-0.15%)`	⬇️

Impacted Files	Coverage Δ
pandas/core/groupby/ops.py	`19.47% <0%> (-76.49%)`	⬇️
pandas/io/formats/latex.py	`0% <0%> (-100%)`	⬇️
pandas/io/sas/sas_constants.py	`0% <0%> (-100%)`	⬇️
pandas/core/groupby/categorical.py	`0% <0%> (-100%)`	⬇️
pandas/tseries/plotting.py	`0% <0%> (-100%)`	⬇️
pandas/tseries/converter.py	`0% <0%> (-100%)`	⬇️
pandas/io/formats/html.py	`0% <0%> (-99.37%)`	⬇️
pandas/io/sas/sas7bdat.py	`0% <0%> (-91.16%)`	⬇️
pandas/io/sas/sas_xport.py	`0% <0%> (-90.1%)`	⬇️
pandas/core/tools/numeric.py	`10.44% <0%> (-89.56%)`	⬇️
... and 130 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e26aa00...0dba40d. Read the comment docs.

codecov · 2019-05-13T17:47:30Z

Codecov Report

Merging #26374 into master will increase coverage by <.01%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master   #26374      +/-   ##
==========================================
+ Coverage   91.73%   91.74%   +<.01%     
==========================================
  Files         174      174              
  Lines       50754    50752       -2     
==========================================
+ Hits        46560    46561       +1     
+ Misses       4194     4191       -3

Flag	Coverage Δ
#multiple	`90.25% <100%> (+0.01%)`	⬆️
#single	`41.69% <40%> (-0.09%)`	⬇️

Impacted Files	Coverage Δ
pandas/core/groupby/grouper.py	`98.53% <100%> (ø)`	⬆️
pandas/core/groupby/ops.py	`96% <100%> (+0.03%)`	⬆️
pandas/io/gbq.py	`78.94% <0%> (-10.53%)`	⬇️
pandas/core/frame.py	`97.02% <0%> (-0.12%)`	⬇️
pandas/core/base.py	`97.97% <0%> (-0.02%)`	⬇️
pandas/core/series.py	`93.67% <0%> (ø)`	⬆️
pandas/core/internals/blocks.py	`94.08% <0%> (ø)`	⬆️
pandas/util/testing.py	`90.7% <0%> (+0.1%)`	⬆️
pandas/core/dtypes/dtypes.py	`97.34% <0%> (+0.69%)`	⬆️
pandas/core/computation/expr.py	`97.52% <0%> (+0.82%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 80bddaf...ea023de. Read the comment docs.

shantanu-gontia · 2019-05-13T18:17:23Z

Added the changes you requested in the last commit

shantanu-gontia · 2019-05-13T20:25:22Z

It would appear that this method would not work when subclasses of BaseGrouper are involved. Will patch with a different method in the next commit

pep8speaks · 2019-05-14T12:24:43Z

Hello @shantanu-gontia! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2019-05-19 22:20:05 UTC

pandas/core/groupby/ops.py

jreback · 2019-05-16T11:58:14Z

doc/source/whatsnew/v0.25.0.rst

@@ -408,7 +408,7 @@ Groupby/Resample/Rolling
 - Bug in :meth:`pandas.core.groupby.GroupBy.idxmax` and :meth:`pandas.core.groupby.GroupBy.idxmin` with datetime column would return incorrect dtype (:issue:`25444`, :issue:`15306`)
 - Bug in :meth:`pandas.core.groupby.GroupBy.cumsum`, :meth:`pandas.core.groupby.GroupBy.cumprod`, :meth:`pandas.core.groupby.GroupBy.cummin` and :meth:`pandas.core.groupby.GroupBy.cummax` with categorical column having absent categories, would return incorrect result or segfault (:issue:`16771`)
 - Bug in :meth:`pandas.core.groupby.GroupBy.nth` where NA values in the grouping would return incorrect results (:issue:`26011`)
-
+- Bug in :meth:`pandas.core.groupby.ops.BaseGrouper.groups` in which creating a :class:`GroupBy` object with a key of type :class:`Grouper` would result in producing incorrect groups (:issue:`26326`)


this is not user facing, can you reword a bit

jreback · 2019-05-19T17:16:05Z

pandas/core/groupby/ops.py

@@ -258,6 +267,8 @@ def groups(self):
        if len(self.groupings) == 1:
            return self.groupings[0].groups
        else:
+            def is_basegrouper(self, obj):


this is not needed

jreback · 2019-05-19T17:17:57Z

pandas/tests/groupby/test_groupby.py

+                                     ['C', 'D']], names=['alpha', 'beta'])
+    df = pd.DataFrame({'foo': [1, 2, 1, 2], 'bar': [1, 2, 3, 4]},
+                      index=mi)
+    grp1 = df.groupby([pd.Grouper(level='alpha'), 'beta'])


can you call this

result =

and the nbggrp1 -> expected

then do then consecutively, e.g. like

result = df.groupby(....) expected = .... assert result.groups == expected.groups result = df...... expected = .... assert

sure, looks cleaner that way.

jreback · 2019-05-19T17:18:18Z

pandas/tests/groupby/test_groupby.py

+
+
+def test_groupby_groups_in_BaseGrouper():
+    # GH 26326


can you give a 1-line description of what this is testing

sure. will do so in the next commit

jreback · 2019-05-19T17:19:40Z

doc/source/whatsnew/v0.25.0.rst

@@ -444,9 +444,9 @@ Groupby/Resample/Rolling
 - Bug in :meth:`pandas.core.groupby.GroupBy.cumsum`, :meth:`pandas.core.groupby.GroupBy.cumprod`, :meth:`pandas.core.groupby.GroupBy.cummin` and :meth:`pandas.core.groupby.GroupBy.cummax` with categorical column having absent categories, would return incorrect result or segfault (:issue:`16771`)
 - Bug in :meth:`pandas.core.groupby.GroupBy.nth` where NA values in the grouping would return incorrect results (:issue:`26011`)
 - Bug in :meth:`pandas.core.groupby.SeriesGroupBy.transform` where transforming an empty group would raise error (:issue:`26208`)
+- Bug in :meth:`pandas.core.frame.DataFrame.groupby` where passing a :class:`pandas.core.groupby.grouper.Grouper` would return incorrect groups (:issue:`26326`)


incorrect groups -> incorrect groups when using .groups accessor

will do so in the next commit.

jreback · 2019-05-20T00:23:13Z

thanks @shantanu-gontia

shantanu-gontia added 2 commits May 13, 2019 22:14

BUG: ngroups and len(groups) do not equal when grouping with a list o…

47a1057

…f Grouper and column label (GH26326)

Sorted Imports in test_groupby.py

cad3d6b

WillAyd requested changes May 13, 2019

View reviewed changes

WillAyd added the Groupby label May 13, 2019

Updated test and edited bugfix message

0dba40d

Merge remote-tracking branch 'upstream/master' into bug26326

bb3f6de

shantanu-gontia added 2 commits May 14, 2019 16:54

Merge remote-tracking branch 'upstream/master' into bug26326

dc76a2d

Updated method to not check for subclasses

648ddfa

shantanu-gontia added 2 commits May 14, 2019 20:44

Merge remote-tracking branch 'upstream/master' into bug26326

3aaa15c

fixed under indent on line 264

f8a3220

jreback requested changes May 16, 2019

View reviewed changes

pandas/core/groupby/ops.py Outdated Show resolved Hide resolved

jreback requested changes May 16, 2019

View reviewed changes

shantanu-gontia added 8 commits May 16, 2019 21:07

Updated release file

ca6eff6

Merge branch 'master' into bug26326

c3d5de2

patched solution with modification of BaseGrouper

f8f3d8f

updated whatsnew file with better wording

b8e4b34

fixed whatsnew release note message

bfdea4c

added issue number to message

49615db

fixed linting

050fb89

Merge branch 'master' into bug26326

bf48c41

jreback requested changes May 19, 2019

View reviewed changes

jreback added this to the 0.25.0 milestone May 19, 2019

jreback reviewed May 19, 2019

View reviewed changes

shantanu-gontia added 2 commits May 20, 2019 03:20

fixed release note msg

af63e6e

Added test message, refactor test code

fe0405a

shantanu-gontia added 2 commits May 20, 2019 03:30

remove extraneous function

ee4dffe

update release note msg

ea023de

jreback approved these changes May 20, 2019

View reviewed changes

jreback merged commit 279753c into pandas-dev:master May 20, 2019

falcaopetri mentioned this pull request Mar 29, 2020

BUG: wrong df.groupby().groups when grouping with [Grouper(freq=), ...] #33132

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: ngroups and len(groups) do not equal when grouping with a list of Grouper and column label (GH26326) #26374

BUG: ngroups and len(groups) do not equal when grouping with a list of Grouper and column label (GH26326) #26374

shantanu-gontia commented May 13, 2019 •

edited

Loading

WillAyd left a comment

WillAyd May 13, 2019

shantanu-gontia May 13, 2019

WillAyd May 13, 2019

shantanu-gontia May 13, 2019 •

edited

Loading

WillAyd May 13, 2019

WillAyd May 13, 2019

WillAyd May 13, 2019

WillAyd May 13, 2019

WillAyd May 13, 2019

shantanu-gontia May 13, 2019

codecov bot commented May 13, 2019

codecov bot commented May 13, 2019 •

edited

Loading

shantanu-gontia commented May 13, 2019

shantanu-gontia commented May 13, 2019

pep8speaks commented May 14, 2019 •

edited

Loading

jreback May 16, 2019

jreback May 19, 2019

jreback May 19, 2019

shantanu-gontia May 19, 2019

jreback May 19, 2019

shantanu-gontia May 19, 2019

jreback May 19, 2019 •

edited

Loading

shantanu-gontia May 19, 2019

jreback commented May 20, 2019



		def test_groupby_groups_in_BaseGrouper():
		# https://github.com/pandas-dev/pandas/issues/26326

	# https://github.com/pandas-dev/pandas/issues/26326
	# GH 26326

BUG: ngroups and len(groups) do not equal when grouping with a list of Grouper and column label (GH26326) #26374

BUG: ngroups and len(groups) do not equal when grouping with a list of Grouper and column label (GH26326) #26374

Conversation

shantanu-gontia commented May 13, 2019 • edited Loading

WillAyd left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

shantanu-gontia May 13, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov bot commented May 13, 2019

Codecov Report

codecov bot commented May 13, 2019 • edited Loading

Codecov Report

shantanu-gontia commented May 13, 2019

shantanu-gontia commented May 13, 2019

pep8speaks commented May 14, 2019 • edited Loading

Comment last updated at 2019-05-19 22:20:05 UTC

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback May 19, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback commented May 20, 2019

shantanu-gontia commented May 13, 2019 •

edited

Loading

shantanu-gontia May 13, 2019 •

edited

Loading

codecov bot commented May 13, 2019 •

edited

Loading

pep8speaks commented May 14, 2019 •

edited

Loading

jreback May 19, 2019 •

edited

Loading