Skip to content

BUG: Series.groupby.size returning int64 for masked and arrow types #54132

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Jul 18, 2023

Conversation

mroeschke
Copy link
Member

  • closes #xxxx (Replace xxxx with the GitHub issue number)
  • Tests added and passed if fixing a bug or adding a new feature
  • All code checks passed.
  • Added type annotations to new arguments/methods/functions.
  • Added an entry in the latest doc/source/whatsnew/vX.X.X.rst file if fixing a bug or adding a new feature.

@mroeschke mroeschke added Groupby NA - MaskedArrays Related to pd.NA and nullable extension arrays Arrow pyarrow functionality labels Jul 15, 2023
@mroeschke mroeschke added this to the 2.1 milestone Jul 15, 2023
@mroeschke mroeschke requested a review from rhshadrach as a code owner July 15, 2023 00:00
@mroeschke mroeschke changed the title BUG: Series.groupby.count returning int64 for masked and arrow types BUG: Series.groupby.size returning int64 for masked and arrow types Jul 15, 2023
@@ -2890,12 +2891,21 @@ def size(self) -> DataFrame | Series:
Freq: MS, dtype: int64
"""
result = self.grouper.size()
result_dtype: str | np.dtype = result.dtype
if isinstance(self.obj, Series):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we just use convert_dtypes for this? I think you've got another PR doing something similar for another method?

(i actually think for methods that return ints-and-never-NAs it is more useful to return non-nullable dtype, but whatever)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we just use convert_dtypes for this?

Sure, I will still have to do this original-object introspection though to set the correct dtype_backend for convert_dtype if that's alright with you

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

either way i guess

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Used convert_dtypes instead

Copy link
Member

@rhshadrach rhshadrach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@mroeschke mroeschke merged commit e0964c2 into pandas-dev:main Jul 18, 2023
@mroeschke mroeschke deleted the bug/gb/size_ser_ea branch July 18, 2023 23:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Arrow pyarrow functionality Groupby NA - MaskedArrays Related to pd.NA and nullable extension arrays
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants