Skip to content

BUG: DataFrame.agg and apply with 'size' returns a scalar #39935

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Feb 24, 2021
1 change: 1 addition & 0 deletions doc/source/whatsnew/v1.3.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -320,6 +320,7 @@ Numeric
- Bug in :meth:`DataFrame.rank` with ``np.inf`` and mixture of ``np.nan`` and ``np.inf`` (:issue:`32593`)
- Bug in :meth:`DataFrame.rank` with ``axis=0`` and columns holding incomparable types raising ``IndexError`` (:issue:`38932`)
- Bug in :func:`select_dtypes` different behavior between Windows and Linux with ``include="int"`` (:issue:`36569`)
- Bug in :meth:`DataFrame.apply` and :meth:`DataFrame.agg` when passed argument ``func="size"`` would operate on the entire ``DataFrame`` instead of rows or columns (:issue:`39934`)
-

Conversion
Expand Down
20 changes: 14 additions & 6 deletions pandas/core/apply.py
Original file line number Diff line number Diff line change
Expand Up @@ -157,6 +157,10 @@ def f(x):
def index(self) -> Index:
return self.obj.index

@property
def agg_axis(self) -> Index:
return self.obj._get_agg_axis(self.axis)

@abc.abstractmethod
def apply(self) -> FrameOrSeriesUnion:
pass
Expand Down Expand Up @@ -414,17 +418,25 @@ def maybe_apply_str(self) -> Optional[FrameOrSeriesUnion]:
f = self.f
if not isinstance(f, str):
return None

obj = self.obj

if f == "size" and isinstance(obj, ABCDataFrame):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doesn't this affect Series as well?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For a Series, size gives the number of rows and so the default path calling obj.size produces the correct result.

# Special-cased because DataFrame.size returns a single scalar
value = obj.shape[self.axis]
return obj._constructor_sliced(value, index=self.agg_axis, name="size")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it intentional to set the name here? This isn't consistent with other string agg methods

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, thanks @TheNeuralBit! Opened #42046

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!


# Support for `frame.transform('method')`
# Some methods (shift, etc.) require the axis argument, others
# don't, so inspect and insert if necessary.
func = getattr(self.obj, f, None)
func = getattr(obj, f, None)
if callable(func):
sig = inspect.getfullargspec(func)
if "axis" in sig.args:
self.kwargs["axis"] = self.axis
elif self.axis != 0:
raise ValueError(f"Operation {f} does not support axis=1")
return self.obj._try_aggregate_string_function(f, *self.args, **self.kwargs)
return obj._try_aggregate_string_function(f, *self.args, **self.kwargs)

def maybe_apply_multiple(self) -> Optional[FrameOrSeriesUnion]:
"""
Expand Down Expand Up @@ -486,10 +498,6 @@ def values(self):
def dtypes(self) -> Series:
return self.obj.dtypes

@property
def agg_axis(self) -> Index:
return self.obj._get_agg_axis(self.axis)

def apply(self) -> FrameOrSeriesUnion:
""" compute the results """
# dispatch to agg
Expand Down
18 changes: 14 additions & 4 deletions pandas/tests/apply/test_frame_apply.py
Original file line number Diff line number Diff line change
Expand Up @@ -1376,16 +1376,26 @@ def test_non_callable_aggregates(self, how):
tm.assert_frame_equal(result2, expected, check_like=True)

# Just functional string arg is same as calling df.arg()
# on the columns
result = getattr(df, how)("count")
expected = df.count()

tm.assert_series_equal(result, expected)

@pytest.mark.parametrize("how", ["agg", "apply"])
def test_size_as_str(self, how, axis):
# GH 39934
df = DataFrame(
{"A": [None, 2, 3], "B": [1.0, np.nan, 3.0], "C": ["foo", None, "bar"]}
)
# Just a string attribute arg same as calling df.arg
result = getattr(df, how)("size")
expected = df.size

assert result == expected
# on the columns
result = getattr(df, how)("size", axis=axis)
if axis == 0 or axis == "index":
expected = Series(df.shape[0], index=df.columns, name="size")
else:
expected = Series(df.shape[1], index=df.index, name="size")
tm.assert_series_equal(result, expected)

def test_agg_listlike_result(self):
# GH-29587 user defined function returning list-likes
Expand Down