Skip to content

Prevent Unlimited Agg Recursion with Duplicate Col Names #21066

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
May 17, 2018
5 changes: 4 additions & 1 deletion doc/source/whatsnew/v0.23.1.txt
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,10 @@ Documentation Changes
Bug Fixes
~~~~~~~~~

-
Groupby/Resample/Rolling
^^^^^^^^^^^^^^^^^^^^^^^^

- Bug in :func:`DataFrame.agg` where applying multiple aggregation functions to a :class:`DataFrame` with duplicated column names would cause a stack overflow (:issue:`21063`)
-

Conversion
Expand Down
6 changes: 3 additions & 3 deletions pandas/core/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -590,9 +590,10 @@ def _aggregate_multiple_funcs(self, arg, _level, _axis):

# multiples
else:
for col in obj:
for index, col in enumerate(obj):
try:
colg = self._gotitem(col, ndim=1, subset=obj[col])
colg = self._gotitem(col, ndim=1,
subset=obj.iloc[:, index])
results.append(colg.aggregate(arg))
keys.append(col)
except (TypeError, DataError):
Expand Down Expand Up @@ -675,7 +676,6 @@ def _gotitem(self, key, ndim, subset=None):
subset : object, default None
subset to act on
"""

# create a new object to prevent aliasing
if subset is None:
subset = self.obj
Expand Down
11 changes: 9 additions & 2 deletions pandas/core/frame.py
Original file line number Diff line number Diff line change
Expand Up @@ -5731,7 +5731,12 @@ def diff(self, periods=1, axis=0):
# ----------------------------------------------------------------------
# Function application

def _gotitem(self, key, ndim, subset=None):
def _gotitem(self,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realize we don't have an overall strategy for annotations just yet but I had to think through this as I was debugging anyway, so figured I'd put here explicitly for when we turn this on

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok!

key, # type: Union[str, List[str]]
ndim, # type: int
subset=None # type: Union[Series, DataFrame, None]
):
# type: (...) -> Union[Series, DataFrame]
"""
sub-classes to define
return a sliced object
Expand All @@ -5746,9 +5751,11 @@ def _gotitem(self, key, ndim, subset=None):
"""
if subset is None:
subset = self
elif subset.ndim == 1: # is Series
return subset

# TODO: _shallow_copy(subset)?
return self[key]
return subset[key]

_agg_doc = dedent("""
The aggregation operations are always performed over an axis, either the
Expand Down
8 changes: 8 additions & 0 deletions pandas/tests/frame/test_apply.py
Original file line number Diff line number Diff line change
Expand Up @@ -554,6 +554,14 @@ def test_apply_non_numpy_dtype(self):
result = df.apply(lambda x: x)
assert_frame_equal(result, df)

def test_apply_dup_names_multi_agg(self):
# GH 21063
df = pd.DataFrame([[0, 1], [2, 3]], columns=['a', 'a'])
expected = pd.DataFrame([[0, 1]], columns=['a', 'a'], index=['min'])
result = df.agg(['min'])

tm.assert_frame_equal(result, expected)


class TestInferOutputShape(object):
# the user has supplied an opaque UDF where
Expand Down