-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Fix GroupBy nth Handling with Observed=False #26419
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 6 commits
03ee26b
ee549ed
f0a510d
f671204
94dda01
e59a991
3677471
34c2f06
2ca34e3
d3e5efa
f9758b8
ad729c5
5b7b6bc
aff7327
47201fb
56822cc
1804e27
4c2e413
a837564
308e569
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -42,7 +42,7 @@ class providing the base-class of operations. | |
from pandas.core.frame import DataFrame | ||
from pandas.core.generic import NDFrame | ||
from pandas.core.groupby import base | ||
from pandas.core.index import Index, MultiIndex | ||
from pandas.core.index import CategoricalIndex, Index, MultiIndex | ||
from pandas.core.series import Series | ||
from pandas.core.sorting import get_group_index_sorter | ||
|
||
|
@@ -839,6 +839,7 @@ def _cython_transform(self, how, numeric_only=True, **kwargs): | |
def _cython_agg_general(self, how, alt=None, numeric_only=True, | ||
min_count=-1): | ||
output = {} | ||
|
||
for name, obj in self._iterate_slices(): | ||
is_numeric = is_numeric_dtype(obj.dtype) | ||
if numeric_only and not is_numeric: | ||
|
@@ -1707,7 +1708,12 @@ def nth(self, | |
if not self.as_index: | ||
return out | ||
|
||
out.index = self.grouper.result_index[ids[mask]] | ||
result_index = self.grouper.result_index | ||
out.index = result_index[ids[mask]] | ||
|
||
if not self.observed and isinstance( | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I was hoping to get this into There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. you just need a small change I think in There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. IIUC I think There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I am not convinced here; this looks like just a bandaid to me and there is an underlying issue that needs fixing. also what about dropna=True? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
>>> grp.nth(0, dropna='all')
cat
a 1.0
b 2.0
c 3.0
Name: ser, dtype: float64 Which is not correct since Also trying to mix that with observed doesn't work: >>> grp = df.groupby('cat', observed=True)['ser']
>>> grp.nth(0, dropna='all')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/williamayd/clones/pandas/pandas/core/groupby/groupby.py", line 1773, in nth
result.index = self.grouper.result_index
File "/Users/williamayd/clones/pandas/pandas/core/generic.py", line 5144, in __setattr__
return object.__setattr__(self, name, value)
File "pandas/_libs/properties.pyx", line 67, in pandas._libs.properties.AxisProperty.__set__
obj._set_axis(self.axis, value)
File "/Users/williamayd/clones/pandas/pandas/core/series.py", line 381, in _set_axis
self._data.set_axis(axis, labels)
File "/Users/williamayd/clones/pandas/pandas/core/internals/managers.py", line 155, in set_axis
'values have {new} elements'.format(old=old_len, new=new_len))
ValueError: Length mismatch: Expected axis has 3 elements, new values have 1 elements So both weird though I'm not sure how we want to handle the combinations of observed and dropna; will open a separate issue There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. See #26454 |
||
result_index, CategoricalIndex): | ||
out = out.reindex(result_index) | ||
WillAyd marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
return out.sort_index() if self.sort else out | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move to 0.25.1