Skip to content

ENH: retain masked EA dtypes in groupby with as_index=False #41373

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 31 commits into from
Jul 25, 2021

Conversation

jbrockmendel
Copy link
Member

@jbrockmendel jbrockmendel commented May 7, 2021

  • closes #xxxx
  • tests added / passed
  • Ensure all linting tests pass, see here for how to run them
  • whatsnew entry

self.assert_series_equal(result, expected)
else:
expected = expected.reset_index()
expected = pd.DataFrame({"B": uniques, "A": [3, 1]})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a user facing change right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

Analogous to result_index, but holding an ArrayLike to ensure
we can can retain ExtensionDtypes.
"""
ridx = self.result_index # initialized _group_arraylike
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you really need this state? seems very magical here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i agree, the statefulness is unpleasant. #41375 starts to unwind it

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you try to unwind first? this is adding a lot

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, let’s get 41375 in and then I’ll rebased and try to trim this down

@jreback
Copy link
Contributor

jreback commented May 17, 2021

can you rebase and will look again

@jreback
Copy link
Contributor

jreback commented May 17, 2021

can you rebase again

@jbrockmendel
Copy link
Member Author

just pushed. this should be correct, but im also working on another branch that could serve as a state-reducing preliminary

@jorisvandenbossche
Copy link
Member

Question: it's certainly nice to already preserve the dtype with as_index=False, but is it worth adding special code for it now if supporting EAs in the Index will also solve it? (assuming it does)

@jbrockmendel
Copy link
Member Author

Question: it's certainly nice to already preserve the dtype with as_index=False, but is it worth adding special code for it now if supporting EAs in the Index will also solve it? (assuming it does)

Good question. Better to think of this as a disentangling/refactor of BaseGrouper/Grouping with a bonus upside of fixing the broken cases (in fact, #41529 does most of this except for fixing the as_index cases). If/when we get the general EA Index , we'll be able to come back and trim this down some more.

@jbrockmendel
Copy link
Member Author

gentle ping; id like to reclaim momentum on the groupby stuff

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you rebase as well

@@ -275,6 +275,7 @@ Other enhancements
- Add keyword ``sort`` to :func:`pivot_table` to allow non-sorting of the result (:issue:`39143`)
- Add keyword ``dropna`` to :meth:`DataFrame.value_counts` to allow counting rows that include ``NA`` values (:issue:`41325`)
- :meth:`Series.replace` will now cast results to ``PeriodDtype`` where possible instead of ``object`` dtype (:issue:`41526`)
- :class:`DataFrameGroupBy` operations with ``as_index=False`` now correctly retain ``ExtensionDtype`` dtypes for columns being grouped on (:issue:`41373`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you move to 1.4

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved + green

@@ -1027,7 +1028,10 @@ def agg_series(
else:
result = self._aggregate_series_fast(obj, func)

npvalues = lib.maybe_convert_objects(result, try_float=False)
convert_datetime = obj.dtype.kind == "M"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what breaks if we remove this inference entirely?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

huh, nothing now. im pretty sure there was something back when i did this. will revert

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated + green

@jreback jreback added this to the 1.4 milestone Jul 15, 2021
@jreback
Copy link
Contributor

jreback commented Jul 15, 2021

might call this a bug fix

@jbrockmendel
Copy link
Member Author

might call this a bug fix

i think id rather call it an ENH, dont want users to think that other functions for which we havent yet done this are bugs

@jreback jreback merged commit daec2e7 into pandas-dev:master Jul 25, 2021
@jbrockmendel jbrockmendel deleted the ref-group_index branch July 25, 2021 14:42
feefladder pushed a commit to feefladder/pandas that referenced this pull request Sep 7, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants