-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
ENH: Implement rounding for floating dtype array #38844 #39751
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: Implement rounding for floating dtype array #38844 #39751
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm minor comment
pandas/core/arrays/numeric.py
Outdated
@@ -188,3 +189,48 @@ def reconstruct(x): | |||
return tuple(reconstruct(x) for x in result) | |||
else: | |||
return reconstruct(result) | |||
|
|||
def _apply(self, func: Callable, **kwargs) -> NumericArray: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @jorisvandenbossche @jbrockmendel do we want specific EA tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not for general EA, but for the numeric arrays specifically, yes. As mentioned on the other PR, I would create a new test_function.py
in the /tests/arrays/masked/
dir for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jorisvandenbossche I can add a file in the /tests/arrays/masked/
folder in order to test the consistency between the numpy round
and the pandas round
for all the following FLOAT_EA_DTYPES
types. Nevertheless, I now use the any_float_allowed_nullable_dtype
fixture in the test test_round_numpy_with_nan
of tests/series/methods/test_round.py
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add general EA tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can add a file in the
/tests/arrays/masked/
folder in order to test the consistency between the numpyround
and the pandasround
for all the followingFLOAT_EA_DTYPES
types. Nevertheless, I now use theany_float_allowed_nullable_dtype
fixture in the testtest_round_numpy_with_nan
oftests/series/methods/test_round.py
.
I would still add a small test in /tests/arrays/masked/
as well. Since this is a public method on the integer and floating arrays, we should test that specifically as well (and not only through Series)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And we should also test calling numpy.round on the array, see #38866 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for picking this up!
See my review at the other PR at #38866 (review), there are still a few outstanding comments I think
index=range(3), | ||
dtype=any_float_allowed_nullable_dtype, | ||
) | ||
result = round(ser).astype(any_float_allowed_nullable_dtype) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is the .astype call added
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right, it is not required.
can you merge master |
pandas/core/arrays/numeric.py
Outdated
The image of the NumericArray | ||
""" | ||
values = func(self._data, **kwargs) | ||
return type(self)(values, self._mask.copy()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if func returns a view or is a no-op, would we need to the the same for the mask?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was related to a comment of @jorisvandenbossche in the original PR proposal #38866
The mask needs to be copied I think? (result should not share a mask with the original array, because otherwise editing one can modify the other. We should probably also test this)
Here, I think this method should only be used with functions which return a copy (which is the case of most numpy functions I know --- without the out
argument). It may need to be précised in the docstring for future developpers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, see comment above as well. Let's not generalize this prematurely, just knowing (and documenting) the func
returns a new object is fine for now IMO.
pandas/core/arrays/numeric.py
Outdated
@@ -199,3 +201,48 @@ def reconstruct(x): | |||
return tuple(reconstruct(x) for x in result) | |||
else: | |||
return reconstruct(result) | |||
|
|||
def _apply(self, func: Callable, **kwargs) -> NumericArray: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add type parameters to Callable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can replace it with Callable[[np.ndarray,...],np.ndarray]
. Is it precise enough for you?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Callable[[np.ndarray],np.ndarray] ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I need to add the ...
for the **kwargs
given to the func
call.
pandas/core/arrays/numeric.py
Outdated
@@ -199,3 +201,48 @@ def reconstruct(x): | |||
return tuple(reconstruct(x) for x in result) | |||
else: | |||
return reconstruct(result) | |||
|
|||
def _apply(self, func: Callable, **kwargs) -> NumericArray: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the return type is the same as self?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I return return type(self)(values, self._mask.copy())
so, yes, it is a NumericArray
. Do you want me to be more precise using a TypeVar
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess it depends on the func signature and the subclass constructors. If a func say changes dtype and the subclass constructor accepts the changed dtype. Is that what we want from apply? or would we want say an FloatArray if we apply a func with a signature Callable[[np.ndarray[int],...],np.ndarray[float]] to say a IntegerArray?
we may want to be more precise and restrictive on the callable and say that the func must return the same type (for now) and as a follow-up implement a more generic _apply function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I mentioned on the previous PR, I would keep it focused here on just getting it working for round
. It's true that if we want such a generic _apply
for several cases, all those questions come up (return type of the function, does the function return a view or new object, etc). But let's deal with those issues when they actually come up.
Personally I think we can also just put those 2 lines of code in round()
itself for now, but documenting the current restrictions of _apply
(func needs to return new object of same dtype) is also fine for me.
pandas/core/arrays/numeric.py
Outdated
values = func(self._data, **kwargs) | ||
return type(self)(values, self._mask.copy()) | ||
|
||
def round(self, decimals: int = 0, *args, **kwargs) -> NumericArray: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return type is same as self?
pandas/core/arrays/numeric.py
Outdated
values = func(self._data, **kwargs) | ||
return type(self)(values, self._mask.copy()) | ||
|
||
def round(self, decimals: int = 0, *args, **kwargs) -> NumericArray: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is *args needed? having *args after a keyword argument can be problematic
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did the same as already done in some other files of the library. For instance here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the updates!
pandas/core/arrays/numeric.py
Outdated
|
||
def round(self, decimals: int = 0, *args, **kwargs) -> NumericArray: | ||
""" | ||
Round each value in NumericArray a to the given number of decimals. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Round each value in NumericArray a to the given number of decimals. | |
Round each value in the array to the given number of decimals. |
(NumericArray is not really public, so would try to avoid it in the docstring where it's not really needed)
pandas/core/arrays/numeric.py
Outdated
@@ -188,3 +189,48 @@ def reconstruct(x): | |||
return tuple(reconstruct(x) for x in result) | |||
else: | |||
return reconstruct(result) | |||
|
|||
def _apply(self, func: Callable, **kwargs) -> NumericArray: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can add a file in the
/tests/arrays/masked/
folder in order to test the consistency between the numpyround
and the pandasround
for all the followingFLOAT_EA_DTYPES
types. Nevertheless, I now use theany_float_allowed_nullable_dtype
fixture in the testtest_round_numpy_with_nan
oftests/series/methods/test_round.py
.
I would still add a small test in /tests/arrays/masked/
as well. Since this is a public method on the integer and floating arrays, we should test that specifically as well (and not only through Series)
pandas/core/arrays/numeric.py
Outdated
@@ -199,3 +201,48 @@ def reconstruct(x): | |||
return tuple(reconstruct(x) for x in result) | |||
else: | |||
return reconstruct(result) | |||
|
|||
def _apply(self, func: Callable, **kwargs) -> NumericArray: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I mentioned on the previous PR, I would keep it focused here on just getting it working for round
. It's true that if we want such a generic _apply
for several cases, all those questions come up (return type of the function, does the function return a view or new object, etc). But let's deal with those issues when they actually come up.
Personally I think we can also just put those 2 lines of code in round()
itself for now, but documenting the current restrictions of _apply
(func needs to return new object of same dtype) is also fine for me.
pandas/core/arrays/numeric.py
Outdated
@@ -188,3 +189,48 @@ def reconstruct(x): | |||
return tuple(reconstruct(x) for x in result) | |||
else: | |||
return reconstruct(result) | |||
|
|||
def _apply(self, func: Callable, **kwargs) -> NumericArray: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And we should also test calling numpy.round on the array, see #38866 (comment)
pandas/core/arrays/numeric.py
Outdated
The image of the NumericArray | ||
""" | ||
values = func(self._data, **kwargs) | ||
return type(self)(values, self._mask.copy()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, see comment above as well. Let's not generalize this prematurely, just knowing (and documenting) the func
returns a new object is fine for now IMO.
Thanks for all your reviews! I will take time to implement them, but I can't before Saturday. |
I pushed some modifications:
|
some precommit issues
|
@jreback two tests failed in code that I did not write/modify? Is this normal? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm. @jorisvandenbossche if any comments.
@benoit9126 the failures are know on that build currently.
thanks for merging master @benoit9126 this lgtm. |
thanks @benoit9126 |
Thanks a lot @benoit9126 ! |
Code proposed in the staled PR #38866