-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: aggregations were getting overwritten if they had the same name #30858
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 25 commits
20049c1
ab685fd
e38e450
cb849a2
521bc1d
ec93c4f
6f9aac8
a8e9121
b857c6d
44d00df
523effb
552063a
5e2e7d2
40f7e31
f8f2d7f
dba7dde
1b43ed1
868a680
5d7f3db
14b2402
829dce8
3469f5d
862b39e
5e3f333
e7629f3
51158ef
447dfea
2693956
aa988a4
7a62f5f
d80ddc5
4f954d4
62d91d1
fb3ba5c
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -282,7 +282,7 @@ def aggregate( | |
if isinstance(ret, dict): | ||
from pandas import concat | ||
|
||
ret = concat(ret, axis=1) | ||
ret = concat(ret.values(), axis=1, keys=[key.label for key in ret.keys()]) | ||
return ret | ||
|
||
agg = aggregate | ||
|
@@ -311,8 +311,8 @@ def _aggregate_multiple_funcs(self, arg): | |
|
||
arg = zip(columns, arg) | ||
|
||
results = {} | ||
for name, func in arg: | ||
results: Dict[base.OutputKey, Union[Series, DataFrame]] = {} | ||
for idx, (name, func) in enumerate(arg): | ||
obj = self | ||
|
||
# reset the cache so that we | ||
|
@@ -321,13 +321,16 @@ def _aggregate_multiple_funcs(self, arg): | |
obj = copy.copy(obj) | ||
obj._reset_cache() | ||
obj._selection = name | ||
results[name] = obj.aggregate(func) | ||
results[base.OutputKey(label=name, position=idx)] = obj.aggregate(func) | ||
MarcoGorelli marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
if any(isinstance(x, DataFrame) for x in results.values()): | ||
# let higher level handle | ||
return results | ||
|
||
return self.obj._constructor_expanddim(results, columns=columns) | ||
if not results: | ||
return DataFrame() | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. hmm is this correct? do we have tests that hit this. I would think we would have somthing e.g. columns even if this is empty There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. also why is this not just handled in wrap_aggregated_output? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Here's a test that hits it: When trying to move this to There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this still is quite fishy . if you pass en empty result to So prefer to have _wrap_aggregated_output handle this correctly. you may not even need L333, its possible to pass columns to _wrap_aggregated_output There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @jreback if we pass Here's the traceback: ============================= test session starts ==============================
platform linux -- Python 3.8.3, pytest-5.4.3, py-1.8.2, pluggy-0.13.1
rootdir: /home/marco/pandas-dev, inifile: setup.cfg
plugins: xdist-1.32.0, cov-2.10.0, asyncio-0.12.0, hypothesis-5.16.1, forked-1.1.2
collected 1 item
pandas/tests/groupby/aggregate/test_aggregate.py F [100%]
=================================== FAILURES ===================================
________________ TestNamedAggregationSeries.test_no_args_raises ________________
self = <pandas.tests.groupby.aggregate.test_aggregate.TestNamedAggregationSeries object at 0x7f8835d975b0>
def test_no_args_raises(self):
gr = pd.Series([1, 2]).groupby([0, 1])
with pytest.raises(TypeError, match="Must provide"):
gr.agg()
# but we do allow this
> result = gr.agg([])
pandas/tests/groupby/aggregate/test_aggregate.py:555:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
pandas/core/groupby/generic.py:247: in aggregate
ret = self._aggregate_multiple_funcs(func)
pandas/core/groupby/generic.py:328: in _aggregate_multiple_funcs
output = self._wrap_aggregated_output(results)
pandas/core/groupby/generic.py:387: in _wrap_aggregated_output
result = self._wrap_series_output(
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = <pandas.core.groupby.generic.SeriesGroupBy object at 0x7f8835d97ac0>
output = {}, index = Int64Index([0, 1], dtype='int64')
def _wrap_series_output(
self, output: Mapping[base.OutputKey, Union[Series, np.ndarray]], index: Index,
) -> Union[Series, DataFrame]:
"""
Wraps the output of a SeriesGroupBy operation into the expected result.
Parameters
----------
output : Mapping[base.OutputKey, Union[Series, np.ndarray]]
Data to wrap.
index : pd.Index
Index to apply to the output.
Returns
-------
Series or DataFrame
Notes
-----
In the vast majority of cases output and columns will only contain one
element. The exception is operations that expand dimensions, like ohlc.
"""
indexed_output = {key.position: val for key, val in output.items()}
columns = Index(key.label for key in output)
result: Union[Series, DataFrame]
if len(output) > 1:
result = self.obj._constructor_expanddim(indexed_output, index=index)
result.columns = columns
else:
result = self.obj._constructor(
> indexed_output[0], index=index, name=columns[0]
)
E KeyError: 0
pandas/core/groupby/generic.py:362: KeyError
-------------- generated xml file: /tmp/tmp-31663hvopHHCRFEu.xml ---------------
=========================== short test summary info ============================
FAILED pandas/tests/groupby/aggregate/test_aggregate.py::TestNamedAggregationSeries::test_no_args_raises
============================== 1 failed in 0.22s =============================== The problem is this line which access There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. i would just fix this, need to check if len(indexed_output) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. So, elif len(indexed_output):
result = self.obj._constructor(
indexed_output[0], index=index, name=columns[0]
)
else:
result = self.obj._constructor() ? I can do that, but then I'll still have to address #34977 when the output of
Are you sure? It seems to only take on argument (other than There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @jreback would elif len(indexed_output):
result = self.obj._constructor(
indexed_output[0], index=index, name=columns[0]
)
else:
return None be an acceptable solution? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Actually, elif not columns.empty:
result = self.obj._constructor(
indexed_output[0], index=index, name=columns[0]
)
else:
result = self.obj._constructor_expanddim() works, because pd.DataFrame(pd.DataFrame(), columns=[]) is allowed. No need to modify the return types like this :) |
||
output = self._wrap_aggregated_output(results) | ||
return self.obj._constructor_expanddim(output, columns=columns) | ||
|
||
def _wrap_series_output( | ||
self, output: Mapping[base.OutputKey, Union[Series, np.ndarray]], index: Index, | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI: the link to this method won't render, since
SeriesGroupBy
isn't in the pands namespace.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry about that - will make sure the build the whatsnew file in the future to check