-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
REF: document casting behavior in groupby #41376
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 1 commit
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -967,11 +967,22 @@ def _cython_operation( | |
) | ||
|
||
@final | ||
def agg_series(self, obj: Series, func: F) -> ArrayLike: | ||
def agg_series(self, obj: Series, func: F, preserve: bool = False) -> ArrayLike: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. preserve_dtype? I don't know if we use this keyword elsewhere There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. or cast? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. changing to preserve_dtype |
||
""" | ||
Parameters | ||
---------- | ||
obj : Series | ||
func : function taking a Series and returning a scalar-like | ||
preserve : bool | ||
Whether the aggregation is known to be dtype-preserving. | ||
|
||
Returns | ||
------- | ||
np.ndarray or ExtensionArray | ||
""" | ||
# test_groupby_empty_with_category gets here with self.ngroups == 0 | ||
# and len(obj) > 0 | ||
|
||
cast_back = True | ||
if len(obj) == 0: | ||
# SeriesGrouper would raise if we were to call _aggregate_series_fast | ||
result = self._aggregate_series_pure_python(obj, func) | ||
|
@@ -983,17 +994,21 @@ def agg_series(self, obj: Series, func: F) -> ArrayLike: | |
# TODO: can we get a performant workaround for EAs backed by ndarray? | ||
result = self._aggregate_series_pure_python(obj, func) | ||
|
||
# we can preserve a little bit more aggressively with EA dtype | ||
# because maybe_cast_pointwise_result will do a try/except | ||
# with _from_sequence. NB we are assuming here that _from_sequence | ||
# is sufficiently strict that it casts appropriately. | ||
preserve = True | ||
|
||
elif obj.index._has_complex_internals: | ||
# Preempt TypeError in _aggregate_series_fast | ||
result = self._aggregate_series_pure_python(obj, func) | ||
|
||
else: | ||
result = self._aggregate_series_fast(obj, func) | ||
cast_back = False | ||
|
||
npvalues = lib.maybe_convert_objects(result, try_float=False) | ||
if cast_back: | ||
# TODO: Is there a documented reason why we dont always cast_back? | ||
if preserve: | ||
out = maybe_cast_pointwise_result(npvalues, obj.dtype, numeric_only=True) | ||
else: | ||
out = npvalues | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any perf impact by passing a list to DataFrame constructor?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
test_apply_numeric_coercion_when_datetime goes through here. running the usage in timeit looks like a wash
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think we may be able to do better/cleaner following #40489