Skip to content

BUG: DataFrame.agg and apply with 'size' returns a scalar #39935

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Feb 24, 2021

Conversation

rhshadrach
Copy link
Member

@rhshadrach rhshadrach added Bug Apply Apply, Aggregate, Transform, Map labels Feb 20, 2021
@jreback jreback added this to the 1.3 milestone Feb 21, 2021
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rebase as well.

It might be better to detect this argument 'size' and do a translation in
pandas/core/base.py: _builtin_table


obj = self.obj

if f == "size" and isinstance(obj, ABCDataFrame):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doesn't this affect Series as well?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For a Series, size gives the number of rows and so the default path calling obj.size produces the correct result.

@rhshadrach
Copy link
Member Author

@jreback - I merged master.

It might be better to detect this argument 'size' and do a translation in
pandas/core/base.py: _builtin_table

This isn't a builtin; DataFrame.size reports the number of cells in the frame (rows * columns). Also not sure what we'd map it to, count is similar and operates on rows/columns but drops nans whereas size does not.

@jreback
Copy link
Contributor

jreback commented Feb 21, 2021

my point is to map this to a lambda function to avoid special casing like this

@rhshadrach
Copy link
Member Author

@jreback - makes sense, and I agree your suggestion is the prefered way. The one issue is that when replacing func with a lambda, the resulting Series has name None whereas tests currently expect "size". I see two ways to handle (hopefully not overlooking something simple):

  1. Mark as TODO and tackle when consolidating code for wrapping results
  2. Add a "result_name" attribute the the Apply class, set to None in all cases except this one where it is "size", and apply it in various methods where appropriate.

It seems easier and less error-prone to go with 1, but I can do 2 here if you prefer.

@jreback
Copy link
Contributor

jreback commented Feb 22, 2021

@jreback - makes sense, and I agree your suggestion is the prefered way. The one issue is that when replacing func with a lambda, the resulting Series has name None whereas tests currently expect "size". I see two ways to handle (hopefully not overlooking something simple):

  1. Mark as TODO and tackle when consolidating code for wrapping results
  2. Add a "result_name" attribute the the Apply class, set to None in all cases except this one where it is "size", and apply it in various methods where appropriate.

It seems easier and less error-prone to go with 1, but I can do 2 here if you prefer.

yeah happy to do this in the future. 2 sounds interesting actually. why don't you add a TODO for now (and issue) then can merge this one.

@rhshadrach
Copy link
Member Author

rhshadrach commented Feb 23, 2021

@jreback TODO added, issue opened, and green(ish - failure is unrelated).

@jreback
Copy link
Contributor

jreback commented Feb 24, 2021

can you merge master & ping on greenish

…g_size

� Conflicts:
�	pandas/tests/apply/test_frame_apply.py
@jreback jreback merged commit 212323f into pandas-dev:master Feb 24, 2021
@jreback
Copy link
Contributor

jreback commented Feb 24, 2021

thanks @rhshadrach

@rhshadrach rhshadrach deleted the agg_size branch February 24, 2021 17:30
if f == "size" and isinstance(obj, ABCDataFrame):
# Special-cased because DataFrame.size returns a single scalar
value = obj.shape[self.axis]
return obj._constructor_sliced(value, index=self.agg_axis, name="size")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it intentional to set the name here? This isn't consistent with other string agg methods

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, thanks @TheNeuralBit! Opened #42046

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Apply Apply, Aggregate, Transform, Map Bug
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: DataFrame.agg and apply with 'size' returns a scalar
3 participants