Skip to content

BUG: DataFrame.groupby() interprets tuple as list of keys #17996

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

GuessWhoSamFoo
Copy link
Contributor

In [2]: df = pd.DataFrame([[1, 2, 3, 4], [3, 4, 5, 6], [1, 4, 2, 3]],
   ...:                            columns=pd.MultiIndex.from_arrays([['a', 'b', 'b', 'c'],
   ...:                                                               [1, 1, 2, 2]]))

In [3]: df.groupby([('b', 1)]).groups
Out[3]: {2: Int64Index([0], dtype='int64'), 4: Int64Index([1, 2], dtype='int64')}

In [4]: df.groupby(('b', 1)).groups
Out[4]: {2: Int64Index([0], dtype='int64'), 4: Int64Index([1, 2], dtype='int64')}

@codecov
Copy link

codecov bot commented Oct 27, 2017

Codecov Report

Merging #17996 into master will decrease coverage by 0.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #17996      +/-   ##
==========================================
- Coverage   91.25%   91.23%   -0.02%     
==========================================
  Files         163      163              
  Lines       50115    50116       +1     
==========================================
- Hits        45730    45722       -8     
- Misses       4385     4394       +9
Flag Coverage Δ
#multiple 89.04% <100%> (ø) ⬆️
#single 40.32% <0%> (+0.02%) ⬆️
Impacted Files Coverage Δ
pandas/core/groupby.py 92.04% <100%> (ø) ⬆️
pandas/io/gbq.py 25% <0%> (-58.34%) ⬇️
pandas/core/frame.py 97.75% <0%> (-0.1%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 881ee30...afb0031. Read the comment docs.

@@ -2769,7 +2769,10 @@ def _get_grouper(obj, key=None, axis=0, level=None, sort=True,
keys = [key]
match_axis_length = False
else:
keys = key
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make this part an elif
add a comment

if isinstance(key, tuple) and isinstance(obj._info_axis, MultiIndex):
keys = [key]
else:
keys = key
match_axis_length = len(keys) == len(group_axis)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not right the match_len axis with a tuplr

@jreback
Copy link
Contributor

jreback commented Oct 27, 2017

add a whatsnew for 0.21.1.

if you want to try to clean up all of that if logic would be a big +

@GuessWhoSamFoo
Copy link
Contributor Author

match_axis_length boils down to whether the keys need to be converted by _asarray_tuplesafe. Since any_arraylike will be true for this case, the intuition is that there is no need to worry about match_axis_length

@@ -1052,6 +1052,7 @@ Groupby/Resample/Rolling
- Bug in ``DataFrame.groupby`` where a single level selection from a ``MultiIndex`` unexpectedly sorts (:issue:`17537`)
- Bug in ``DataFrame.groupby`` where spurious warning is raised when ``Grouper`` object is used to override ambiguous column name (:issue:`17383`)
- Bug in ``TimeGrouper`` differs when passes as a list and as a scalar (:issue:`17530`)
- Bug in ``DataFrame.groupby`` where key as tuple in a ``MultiIndex`` were interpreted as a list of keys (:issue:`17979`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move to 0.22.0

@@ -2765,7 +2765,9 @@ def _get_grouper(obj, key=None, axis=0, level=None, sort=True,
elif isinstance(key, BaseGrouper):
return key, [], obj

if not isinstance(key, (tuple, list)):
# when MultiIndex, allow tuple to be a key
if not isinstance(key, (tuple, list)) or isinstance(key, tuple) and \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use parens to avoid ambiguity when reading this condition

add an

is_axis_multiindex = isinstance(obj._info_axis, MultiIndex)

higher up and use it here (and elsewhere in this functionif you can)

@jreback jreback added this to the 0.22.0 milestone Oct 31, 2017
@jreback
Copy link
Contributor

jreback commented Oct 31, 2017

lgtm, ping on green.

@GuessWhoSamFoo
Copy link
Contributor Author

@jreback Rebased and green 👍

@jreback jreback closed this in 1719437 Nov 1, 2017
@jreback
Copy link
Contributor

jreback commented Nov 1, 2017

thanks @GuessWhoSamFoo keep em coming!

1kastner pushed a commit to 1kastner/pandas that referenced this pull request Nov 5, 2017
closes pandas-dev#17979

Author: sfoo <[email protected]>
Author: Jeff Reback <[email protected]>

Closes pandas-dev#17996 from GuessWhoSamFoo/groupby_tuples and squashes the following commits:

afb0031 [Jeff Reback] TST: separate out grouping-type tests
c52b2a8 [sfoo] Moved notes to 0.22; created is_axis_multiindex var - pending internal use
fb52c1c [sfoo] Added whatsnew; checked match_axis_length
99ebc4e [sfoo] Cast groupby tuple as list when multiindex
@toobaz toobaz mentioned this pull request Nov 12, 2017
2 tasks
jreback pushed a commit that referenced this pull request Nov 13, 2017
xref #17996

Author: Pietro Battiston <[email protected]>

Closes #18249 from toobaz/groupby_tuples and squashes the following commits:

dafc838 [Pietro Battiston] DOC: Clarification of groupby(by=) argument
e0bdfa7 [Pietro Battiston] TST: Test for tuples in columns, fixes to previous tests
74f91e0 [Pietro Battiston] TST: Fix tests which used tuples to pass multiple keys
201a4fe [Pietro Battiston] BUG: Never interpret a tuple as a list of keys
No-Stream pushed a commit to No-Stream/pandas that referenced this pull request Nov 28, 2017
closes pandas-dev#17979

Author: sfoo <[email protected]>
Author: Jeff Reback <[email protected]>

Closes pandas-dev#17996 from GuessWhoSamFoo/groupby_tuples and squashes the following commits:

afb0031 [Jeff Reback] TST: separate out grouping-type tests
c52b2a8 [sfoo] Moved notes to 0.22; created is_axis_multiindex var - pending internal use
fb52c1c [sfoo] Added whatsnew; checked match_axis_length
99ebc4e [sfoo] Cast groupby tuple as list when multiindex
No-Stream pushed a commit to No-Stream/pandas that referenced this pull request Nov 28, 2017
xref pandas-dev#17996

Author: Pietro Battiston <[email protected]>

Closes pandas-dev#18249 from toobaz/groupby_tuples and squashes the following commits:

dafc838 [Pietro Battiston] DOC: Clarification of groupby(by=) argument
e0bdfa7 [Pietro Battiston] TST: Test for tuples in columns, fixes to previous tests
74f91e0 [Pietro Battiston] TST: Fix tests which used tuples to pass multiple keys
201a4fe [Pietro Battiston] BUG: Never interpret a tuple as a list of keys
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

DataFrame.groupby() interprets tuple as list of keys
2 participants