Skip to content

BUG: aggregations were getting overwritten if they had the same name #30858

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 34 commits into from
Jul 14, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
34 commits
Select commit Hold shift + click to select a range
20049c1
:bug: aggregations were getting overwritten if they had the same name
Jan 9, 2020
ab685fd
:art: shorten test for the sake of legibility
Jan 21, 2020
e38e450
:art: handle empty in , make whatsnewentry public-facing
Jan 21, 2020
cb849a2
:pencil: move whatsnew entry to v1.1.0
Jan 23, 2020
521bc1d
remove accidentally added whatsnewentry
MarcoGorelli Feb 2, 2020
ec93c4f
Merge branch 'master' into multiple-aggregations
MarcoGorelli Mar 3, 2020
6f9aac8
Update v1.1.0.rst
MarcoGorelli Mar 3, 2020
a8e9121
remove dataframe constructor
Mar 4, 2020
b857c6d
Dict instead of Mapping
Mar 4, 2020
44d00df
Merge branch 'master' into multiple-aggregations
MarcoGorelli Mar 5, 2020
523effb
Merge remote-tracking branch 'upstream/master' into multiple-aggregat…
MarcoGorelli Mar 15, 2020
552063a
remove no longer necessary setting of random seed
MarcoGorelli Mar 15, 2020
5e2e7d2
Merge remote-tracking branch 'upstream/master' into multiple-aggregat…
MarcoGorelli Apr 19, 2020
40f7e31
don't return slice in concat
MarcoGorelli Apr 19, 2020
f8f2d7f
Add test containing ohlc
MarcoGorelli Apr 19, 2020
dba7dde
Add named aggregation resample test, add to whatsnew
MarcoGorelli Apr 19, 2020
1b43ed1
revert empty line change
MarcoGorelli Apr 19, 2020
868a680
remove 30092 from whatsnew as the issue is already fixed in 1.0.3 and…
MarcoGorelli Apr 19, 2020
5d7f3db
Merge branch 'master' into multiple-aggregations
MarcoGorelli May 2, 2020
14b2402
catch performancewarning in test
MarcoGorelli May 2, 2020
829dce8
Merge remote-tracking branch 'upstream/master' into multiple-aggregat…
MarcoGorelli May 3, 2020
3469f5d
Merge remote-tracking branch 'upstream/master' into multiple-aggregat…
MarcoGorelli May 9, 2020
862b39e
make test same as in OP
MarcoGorelli May 10, 2020
5e3f333
make test match OP exactly
MarcoGorelli May 10, 2020
e7629f3
Merge remote-tracking branch 'upstream/master' into multiple-aggregat…
MarcoGorelli May 13, 2020
51158ef
split into two tests
MarcoGorelli May 18, 2020
447dfea
split into two tests
MarcoGorelli May 18, 2020
2693956
Merge remote-tracking branch 'upstream/master' into multiple-aggregat…
MarcoGorelli May 18, 2020
aa988a4
add test with namedtuple
MarcoGorelli May 27, 2020
7a62f5f
better layout
MarcoGorelli May 27, 2020
d80ddc5
better layout
MarcoGorelli May 27, 2020
4f954d4
Merge remote-tracking branch 'upstream/master' into multiple-aggregat…
MarcoGorelli Jun 27, 2020
62d91d1
dont special case empty output
MarcoGorelli Jun 27, 2020
fb3ba5c
Merge remote-tracking branch 'upstream/master' into multiple-aggregat…
MarcoGorelli Jul 14, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions doc/source/whatsnew/v1.1.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1101,6 +1101,7 @@ Reshaping
- Bug in :func:`crosstab` when inputs are two Series and have tuple names, the output will keep dummy MultiIndex as columns. (:issue:`18321`)
- :meth:`DataFrame.pivot` can now take lists for ``index`` and ``columns`` arguments (:issue:`21425`)
- Bug in :func:`concat` where the resulting indices are not copied when ``copy=True`` (:issue:`29879`)
- Bug in :meth:`SeriesGroupBy.aggregate` was resulting in aggregations being overwritten when they shared the same name (:issue:`30880`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI: the link to this method won't render, since SeriesGroupBy isn't in the pands namespace.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry about that - will make sure the build the whatsnew file in the future to check

- Bug where :meth:`Index.astype` would lose the name attribute when converting from ``Float64Index`` to ``Int64Index``, or when casting to an ``ExtensionArray`` dtype (:issue:`32013`)
- :meth:`Series.append` will now raise a ``TypeError`` when passed a DataFrame or a sequence containing Dataframe (:issue:`31413`)
- :meth:`DataFrame.replace` and :meth:`Series.replace` will raise a ``TypeError`` if ``to_replace`` is not an expected type. Previously the ``replace`` would fail silently (:issue:`18634`)
Expand Down
15 changes: 9 additions & 6 deletions pandas/core/groupby/generic.py
Original file line number Diff line number Diff line change
Expand Up @@ -278,7 +278,7 @@ def aggregate(
if isinstance(ret, dict):
from pandas import concat

ret = concat(ret, axis=1)
ret = concat(ret.values(), axis=1, keys=[key.label for key in ret.keys()])
return ret

agg = aggregate
Expand Down Expand Up @@ -307,8 +307,8 @@ def _aggregate_multiple_funcs(self, arg):

arg = zip(columns, arg)

results = {}
for name, func in arg:
results: Dict[base.OutputKey, Union[Series, DataFrame]] = {}
for idx, (name, func) in enumerate(arg):
obj = self

# reset the cache so that we
Expand All @@ -317,13 +317,14 @@ def _aggregate_multiple_funcs(self, arg):
obj = copy.copy(obj)
obj._reset_cache()
obj._selection = name
results[name] = obj.aggregate(func)
results[base.OutputKey(label=name, position=idx)] = obj.aggregate(func)

if any(isinstance(x, DataFrame) for x in results.values()):
# let higher level handle
return results

return self.obj._constructor_expanddim(results, columns=columns)
output = self._wrap_aggregated_output(results)
return self.obj._constructor_expanddim(output, columns=columns)

def _wrap_series_output(
self, output: Mapping[base.OutputKey, Union[Series, np.ndarray]], index: Index,
Expand Down Expand Up @@ -354,10 +355,12 @@ def _wrap_series_output(
if len(output) > 1:
result = self.obj._constructor_expanddim(indexed_output, index=index)
result.columns = columns
else:
elif not columns.empty:
result = self.obj._constructor(
indexed_output[0], index=index, name=columns[0]
)
else:
result = self.obj._constructor_expanddim()

return result

Expand Down
58 changes: 58 additions & 0 deletions pandas/tests/groupby/aggregate/test_aggregate.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,10 +2,13 @@
test .agg behavior / note that .apply is tested generally in test_groupby.py
"""
import functools
from functools import partial

import numpy as np
import pytest

from pandas.errors import PerformanceWarning

from pandas.core.dtypes.common import is_integer_dtype

import pandas as pd
Expand Down Expand Up @@ -252,6 +255,61 @@ def test_agg_multiple_functions_maintain_order(df):
tm.assert_index_equal(result.columns, exp_cols)


def test_agg_multiple_functions_same_name():
# GH 30880
df = pd.DataFrame(
np.random.randn(1000, 3),
index=pd.date_range("1/1/2012", freq="S", periods=1000),
columns=["A", "B", "C"],
)
result = df.resample("3T").agg(
{"A": [partial(np.quantile, q=0.9999), partial(np.quantile, q=0.1111)]}
)
expected_index = pd.date_range("1/1/2012", freq="3T", periods=6)
expected_columns = MultiIndex.from_tuples([("A", "quantile"), ("A", "quantile")])
expected_values = np.array(
[df.resample("3T").A.quantile(q=q).values for q in [0.9999, 0.1111]]
).T
expected = pd.DataFrame(
expected_values, columns=expected_columns, index=expected_index
)
tm.assert_frame_equal(result, expected)


def test_agg_multiple_functions_same_name_with_ohlc_present():
# GH 30880
# ohlc expands dimensions, so different test to the above is required.
df = pd.DataFrame(
np.random.randn(1000, 3),
index=pd.date_range("1/1/2012", freq="S", periods=1000),
columns=["A", "B", "C"],
)
result = df.resample("3T").agg(
{"A": ["ohlc", partial(np.quantile, q=0.9999), partial(np.quantile, q=0.1111)]}
)
expected_index = pd.date_range("1/1/2012", freq="3T", periods=6)
expected_columns = pd.MultiIndex.from_tuples(
[
("A", "ohlc", "open"),
("A", "ohlc", "high"),
("A", "ohlc", "low"),
("A", "ohlc", "close"),
("A", "quantile", "A"),
("A", "quantile", "A"),
]
)
non_ohlc_expected_values = np.array(
[df.resample("3T").A.quantile(q=q).values for q in [0.9999, 0.1111]]
).T
expected_values = np.hstack([df.resample("3T").A.ohlc(), non_ohlc_expected_values])
expected = pd.DataFrame(
expected_values, columns=expected_columns, index=expected_index
)
# PerformanceWarning is thrown by `assert col in right` in assert_frame_equal
with tm.assert_produces_warning(PerformanceWarning):
tm.assert_frame_equal(result, expected)


def test_multiple_functions_tuples_and_non_tuples(df):
# #1359
funcs = [("foo", "mean"), "std"]
Expand Down