-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Bug: Grouping by index and column fails on DataFrame with single index (GH14327) #14333
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Existing logic under "if level is not None:" assumed that index was a MultiIndex. Now we check and also handle the case where an Index is passed in with a None grouper. This resolves GH 14327
Current coverage is 85.26% (diff: 95.83%)@@ master #14333 diff @@
==========================================
Files 140 140
Lines 50630 50639 +9
Methods 0 0
Messages 0 0
Branches 0 0
==========================================
+ Hits 43169 43177 +8
- Misses 7461 7462 +1
Partials 0 0
|
@@ -1584,3 +1584,4 @@ Bug Fixes | |||
- ``PeridIndex`` can now accept ``list`` and ``array`` which contains ``pd.NaT`` (:issue:`13430`) | |||
- Bug in ``df.groupby`` where ``.median()`` returns arbitrary values if grouped dataframe contains empty bins (:issue:`13629`) | |||
- Bug in ``Index.copy()`` where ``name`` parameter was ignored (:issue:`14302`) | |||
- Bug in ``df.groupby`` causing an ``AttributeError`` when grouping a single index frame by a column and the index (:issue`14327`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move to 0.20.0
whatsnew moved to 0.20.0 as @jreback requested |
Looks good to me |
if self.name is None: | ||
self.name = index.names[level] | ||
|
||
# XXX complete hack | ||
if isinstance(index, MultiIndex): | ||
inds = index.labels[level] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think this should be a private method in an Index instead (and overridden in MultiIndex)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean everything inside the if isinstance(index, MultiIndex):
block? If so then it looks like the MultiIndex override would need to input grouper
, index
, and level
and return a tuple of labels
, level_index
, and grouper
. This seems a little messy to me since the parent Index method would have no use for the level
parameter and would need to return None for the labels
and level_index
values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just take args and return a tuples of things
no state is kept
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it, thanks for the clarification
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good overall, just a very minor comment. We might be asking to change the whatsnew to v0.19.1, but we need to sort that out first; will keep you posted.
@@ -432,6 +432,13 @@ def _update_inplace(self, result, **kwargs): | |||
# guard when called from IndexOpsMixin | |||
raise TypeError("Index can't be updated inplace") | |||
|
|||
def _get_grouper_for_level(self, grouper, level): | |||
# return grouper if grouper is not None else self |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment isn't all that helpful :) Could you maybe change it to explain what the tuple being returned is (the two Nones are labels
and level_index
) and maybe note that the MultiIndex version is what's useful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, good call :-)
self._labels = labels | ||
self._group_index = level_index | ||
self.grouper = level_index.take(labels) | ||
self.grouper, self._labels, self._group_index = index._get_grouper_for_level(self.grouper, level) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is the line causing the travis failure. You'll need to wrap it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for catching that! Guess I forgot to run flake8
@@ -524,6 +524,39 @@ def _format_native_types(self, na_rep='nan', **kwargs): | |||
|
|||
return mi.values | |||
|
|||
def _get_grouper_for_level(self, grouper, level): | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a docstring here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep, same as above for styling.
@@ -432,6 +432,17 @@ def _update_inplace(self, result, **kwargs): | |||
# guard when called from IndexOpsMixin | |||
raise TypeError("Index can't be updated inplace") | |||
|
|||
def _get_grouper_for_level(self, grouper, level): | |||
# Use self (Index) as grouper if None was passed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add a Parameters
section in the doc-string. and move the in-line comment to the Returns
part.
|
||
# XXX complete hack | ||
|
||
if grouper is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you want to put a comment here explain what is going on would be great (for future readers).
Further you can just return if grouper is not None
(and then don't use an else
), I think makes the code read slightly better.
'B': ['one', 'one', 'two', | ||
'two', 'one', 'one']}, | ||
index=idx) | ||
result = df_multi.groupby(['B', pd.Grouper(level='inner')]).mean() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you try with these reversed as well, e.g. [pd.Grouper(....), 'B']
)
Thanks for the feedback @jreback, @jorisvandenbossche, and @TomAugspurger. Writing the docstrings and working through the type signatures sure helped me better understand the logic I was refactoring! |
@TomAugspurger @jorisvandenbossche @jreback I've noticed that there's now a 0.19.1 whatsnew file. Should I move the whatsnew entry there? |
Yes, as it seems a straightforward bug fix, you can put it in 0.19.1 |
@jorisvandenbossche Hmm, I rebased my branch on upstream/master so that I have a copy of the 0.19.1 whatsnew file to work with. After making the change I did a force push which successfully updated the branch in my fork (https://github.com/jmmease/pandas/tree/bug_14327), but it didn't update this pull request. Should I have merged upstream/master instead of rebasing? |
Well, something is going wrong since we moved the repo from the pydata org to the pandas-dev org. PRs that were already open before seem to have this problem. @wesm @TomAugspurger @jreback anybody experience with this problem, or an idea what could be going on? |
---------- | ||
group_mapper: Group mapping function or None | ||
Function mapping index values to groups | ||
level : int |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just make this level=None
by default and assert it is None for Index. don't put the (Only used phrase)
|
||
Returns | ||
------- | ||
grouper : Index |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it needs to always return a tuple (doesn't matter if they are used or not)
@@ -432,6 +432,35 @@ def _update_inplace(self, result, **kwargs): | |||
# guard when called from IndexOpsMixin | |||
raise TypeError("Index can't be updated inplace") | |||
|
|||
def _get_grouper_for_level(self, group_mapper, level): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
call this mapper
level_index : Index or None | ||
Index of unique values for level | ||
""" | ||
inds = self.labels[level] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
call this indexer
Index of values to group on | ||
labels : ndarray of int or None | ||
Array of locations in level_index | ||
level_index : Index or None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
call this uniques (these are not actually used anywhere, but are descriptive). We use certain terms in the codebase. This will make it consistent.
@jmmease lgtm. just some doc consistency changes. ping on green. |
@jreback Thanks, I'll make these updates this evening. Do you have any guidance on how I should proceed since this PR didn't update with my rebase+force push from yesterday? (See comment from @jorisvandenbossche above) I could try again with a merge instead of rebase or I could open a new PR and reference this one. |
On the short term, opening as a new PR is probably the easiest (if you close this one, you can just create a new one from the same branch) |
PR reopened in #14428 |
git diff upstream/master | flake8 --diff
Don't know if this is too late for 0.19.0 but I went ahead and added the whatsnew entry there for now.