-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
ENH: Accept list as level for groupby in non-MultiIndexed objects #13907
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@@ -2368,6 +2368,13 @@ def _get_grouper(obj, key=None, axis=0, level=None, sort=True, | |||
# axis of the object | |||
if level is not None: | |||
if not isinstance(group_axis, MultiIndex): | |||
if isinstance(level, list) or isinstance(level, range): | |||
if len(level) > 1: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
handle empty case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only checking the length doesn't meet MI behavior. Following works in MI.
pd.DataFrame([[1, 1], [2, 2], [3, 3]], index=pd.MultiIndex.from_arrays([[1, 1, 1], [1, 2, 3]])).groupby(level=[0, 0]).sum()
I think it should raise if level is not scalar.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only checking the length doesn't meet MI behavior.
Done by checking len(set(level))
instead of len(level)
.
handle empty case.
Done by handling len(set(level)) == 0
.
I think it should raise if level is not scalar.
But that's the whole point of this PR and the issue it stems from!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that I think about it: should we raise an error when given an empty iterable for level
instead of making it level=0
?
# GH 13901 | ||
a = Series([1, 2, 3, 10, 4, 5, 20, 6], Index([1, 2, 3, 1, | ||
4, 5, 2, 6], name='foo')) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use s = Series(....)
create an expected result and use tm.assert_series_equal
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. I took the liberty of modifying the test right above (which is the one I based mine off) to use s = Series(...)
and assert_series_equal
.
Current coverage is 85.25% (diff: 100%)@@ master #13907 diff @@
==========================================
Files 140 140
Lines 50568 50575 +7
Methods 0 0
Messages 0 0
Branches 0 0
==========================================
+ Hits 43112 43119 +7
Misses 7456 7456
Partials 0 0
|
s = Series([1, 2, 3, 10, 4, 5, 20, 6], | ||
Index([1, 2, 3, 1, 4, 5, 2, 6], name='foo')) | ||
|
||
result = s.groupby(level=[0]).sum() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also add:
[0, 0]
,[-1]
@sinhrks Made all changes and additions. |
@@ -2368,11 +2369,21 @@ def _get_grouper(obj, key=None, axis=0, level=None, sort=True, | |||
# axis of the object | |||
if level is not None: | |||
if not isinstance(group_axis, MultiIndex): | |||
if is_list_like(level): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add a comment here why this is needed.
you are setting level
, but below we reset to None
(see L2389), so something wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add a comment here why this is needed.
Done.
you are setting level, but below we reset to None (see L2389), so something wrong.
L2368-L2369 says that this chunk is indeed just doing validation: if the passed level
validates, it is reset to None
; if it doesn't, it raises a ValueError
. This PR only adds to the validation logic.
@@ -437,8 +437,7 @@ API changes | |||
- ``pd.Timedelta(None)`` is now accepted and will return ``NaT``, mirroring ``pd.Timestamp`` (:issue:`13687`) | |||
- ``Timestamp``, ``Period``, ``DatetimeIndex``, ``PeriodIndex`` and ``.dt`` accessor have gained a ``.is_leap_year`` property to check whether the date belongs to a leap year. (:issue:`13727`) | |||
- ``pd.read_hdf`` will now raise a ``ValueError`` instead of ``KeyError``, if a mode other than ``r``, ``r+`` and ``a`` is supplied. (:issue:`13623`) | |||
|
|||
|
|||
- ``groupby()`` now accepts ``level=[0]`` (in addition to ``level=0``), ``level=-1`` and ``level=[-1]`` for non-``MultiIndex``ed objects. (:issue:`13907`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
too complicated. a user is reading this. just say .groupby()
will accept a scalar and a single element list for specifyig level on an Index grouper.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
@jreback There is a test that fails, but it has nothing to do with |
can you rebase / update? |
Rebased and all green. |
@@ -2039,6 +2039,23 @@ def test_mulitindex_passthru(self): | |||
result = df.groupby(axis=1, level=[0, 1]).first() | |||
assert_frame_equal(result, df) | |||
|
|||
def test_multiindex_negative_level(self): | |||
result = self.mframe.groupby(level=-1).sum() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add the issue reference as a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
ok lgtm. @jorisvandenbossche |
thanks! |
git diff upstream/master | flake8 --diff