-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
ENH: implement FloatingArray.round() #38866
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
pandas/core/arrays/floating.py
Outdated
@@ -410,6 +410,14 @@ def max(self, *, skipna=True, **kwargs): | |||
nv.validate_max((), kwargs) | |||
return super()._reduce("max", skipna=skipna) | |||
|
|||
def round(self, decimals=0): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should do this generically (and move to NumericArray), e.g. something like
def round(self, decimals=0):
return self._apply(np.round, decimals=decimals)
where _apply handles the masking & recasting. I think we discussed similar in .to_numeric
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or event better
round = _apply_function(np.round, decimals=0, "nice doc-string here")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we're going with the second way (may as well) would _apply_function be method on NumericArray (versus a module level function)?
milestoned as 1.2.1. ok to merge changes to experimental types in patch releases? |
these generally should not be backported as EAs (esp) Floating are still experimental |
we can do on a case by case. the proposed fix here that i suggest is a bit more non-trivial. |
…38844-FloatingArray-round
|
||
def _apply(self, func: Callable, **kwargs) -> "NumericArray": | ||
values = self._data[~self._mask] | ||
values = np.round(values, **kwargs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should be func :->
@@ -130,3 +158,16 @@ def _arith_method(self, other, op): | |||
) | |||
|
|||
return self._maybe_mask_result(result, mask, other, op_name) | |||
|
|||
def _apply(self, func: Callable, **kwargs) -> "NumericArray": | |||
values = self._data[~self._mask] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add a doc-string
@@ -130,3 +158,16 @@ def _arith_method(self, other, op): | |||
) | |||
|
|||
return self._maybe_mask_result(result, mask, other, op_name) | |||
|
|||
def _apply(self, func: Callable, **kwargs) -> "NumericArray": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @jbrockmendel @jorisvandenbossche
shall we make this more general? (e.g. on base.py)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(I would for this PR leave it here in order to do the minimal to actually implement the round()
, and have a follow-up to discuss how we might want to use this more general, because indeed we probably want that)
Since I think being able to round is a quite essential functionality for a floating dtypes that we missed, I think it makes sense to backport this, exactly because it is still experimental anyway. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@arw2019 thanks for working on this!
I think this needs some more testing, and also for the actual array methods (I would maybe create a new test_function.py
in the /tests/arrays/masked/
dir for this)
@@ -54,6 +54,7 @@ Other enhancements | |||
- Add support for dict-like names in :class:`MultiIndex.set_names` and :class:`MultiIndex.rename` (:issue:`20421`) | |||
- :func:`pandas.read_excel` can now auto detect .xlsb files (:issue:`35416`) | |||
- :meth:`.Rolling.sum`, :meth:`.Expanding.sum`, :meth:`.Rolling.mean`, :meth:`.Expanding.mean`, :meth:`.Rolling.median`, :meth:`.Expanding.median`, :meth:`.Rolling.max`, :meth:`.Expanding.max`, :meth:`.Rolling.min`, and :meth:`.Expanding.min` now support ``Numba`` execution with the ``engine`` keyword (:issue:`38895`) | |||
- Added :meth:`NumericArray.round` (:issue:`38844`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NumericArray is not public, and thus we shouldn't mention it in the whatsnew notes. You can say something about "round() being enabled for the nullable integer and floating dtypes"
data[~self._mask] = values | ||
return type(self)(data, self._mask) | ||
|
||
@doc(_round_doc) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The docstring can be moved inline?
@@ -130,3 +158,16 @@ def _arith_method(self, other, op): | |||
) | |||
|
|||
return self._maybe_mask_result(result, mask, other, op_name) | |||
|
|||
def _apply(self, func: Callable, **kwargs) -> "NumericArray": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(I would for this PR leave it here in order to do the minimal to actually implement the round()
, and have a follow-up to discuss how we might want to use this more general, because indeed we probably want that)
@@ -130,3 +158,16 @@ def _arith_method(self, other, op): | |||
) | |||
|
|||
return self._maybe_mask_result(result, mask, other, op_name) | |||
|
|||
def _apply(self, func: Callable, **kwargs) -> "NumericArray": | |||
values = self._data[~self._mask] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think you actually need to subset the _data
with the mask in this case, as "round" should work on all values, and I can't think of a case where it would error by being called on the "invalid" values hidden by the mask.
Of course, if many values are masked, we might be calculating round
on too many values. But doing the filter operation / copy also takes time. Maybe something to time both ways.
|
||
data = np.zeros(self._data.shape) | ||
data[~self._mask] = values | ||
return type(self)(data, self._mask) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The mask needs to be copied I think? (result should not share a mask with the original array, because otherwise editing one can modify the other. We should probably also test this)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good point (and actually the same bug exists already in my implementation of to_numeric
for EAs - #38974). I'll fix this and add tests
|
||
@doc(_round_doc) | ||
def round(self, decimals: int = 0, *args, **kwargs) -> "NumericArray": | ||
nv.validate_round(args, kwargs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we accept args/kwargs here and validate them, then we should also test this (eg doing np.round(float_arr)
triggers this)
I don't think we should be backporting things like this. |
This pull request is stale because it has been open for thirty days with no activity. Please update or respond to this comment if you're still interested in working on this. |
closing as stale. I think we need the general soln outlined above. Happy to re-open / take new PR. |
black pandas
git diff upstream/master -u -- "*.py" | flake8 --diff