-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
ENH: retain masked EA dtypes in groupby with as_index=False #41373
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
jbrockmendel
commented
May 7, 2021
•
edited
Loading
edited
- closes #xxxx
- tests added / passed
- Ensure all linting tests pass, see here for how to run them
- whatsnew entry
self.assert_series_equal(result, expected) | ||
else: | ||
expected = expected.reset_index() | ||
expected = pd.DataFrame({"B": uniques, "A": [3, 1]}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a user facing change right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes
pandas/core/groupby/grouper.py
Outdated
Analogous to result_index, but holding an ArrayLike to ensure | ||
we can can retain ExtensionDtypes. | ||
""" | ||
ridx = self.result_index # initialized _group_arraylike |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you really need this state? seems very magical here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i agree, the statefulness is unpleasant. #41375 starts to unwind it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you try to unwind first? this is adding a lot
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, let’s get 41375 in and then I’ll rebased and try to trim this down
can you rebase and will look again |
can you rebase again |
just pushed. this should be correct, but im also working on another branch that could serve as a state-reducing preliminary |
Question: it's certainly nice to already preserve the dtype with |
Good question. Better to think of this as a disentangling/refactor of BaseGrouper/Grouping with a bonus upside of fixing the broken cases (in fact, #41529 does most of this except for fixing the as_index cases). If/when we get the general EA Index , we'll be able to come back and trim this down some more. |
gentle ping; id like to reclaim momentum on the groupby stuff |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you rebase as well
doc/source/whatsnew/v1.3.0.rst
Outdated
@@ -275,6 +275,7 @@ Other enhancements | |||
- Add keyword ``sort`` to :func:`pivot_table` to allow non-sorting of the result (:issue:`39143`) | |||
- Add keyword ``dropna`` to :meth:`DataFrame.value_counts` to allow counting rows that include ``NA`` values (:issue:`41325`) | |||
- :meth:`Series.replace` will now cast results to ``PeriodDtype`` where possible instead of ``object`` dtype (:issue:`41526`) | |||
- :class:`DataFrameGroupBy` operations with ``as_index=False`` now correctly retain ``ExtensionDtype`` dtypes for columns being grouped on (:issue:`41373`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you move to 1.4
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
moved + green
pandas/core/groupby/ops.py
Outdated
@@ -1027,7 +1028,10 @@ def agg_series( | |||
else: | |||
result = self._aggregate_series_fast(obj, func) | |||
|
|||
npvalues = lib.maybe_convert_objects(result, try_float=False) | |||
convert_datetime = obj.dtype.kind == "M" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what breaks if we remove this inference entirely?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
huh, nothing now. im pretty sure there was something back when i did this. will revert
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated + green
might call this a bug fix |
i think id rather call it an ENH, dont want users to think that other functions for which we havent yet done this are bugs |