-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
ENH: Implement unary operators for FloatingArray class #39916
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: Implement unary operators for FloatingArray class #39916
Conversation
zitorelova
commented
Feb 19, 2021
- closes BUG: TypeError: bad operand type for unary -: 'FloatingArray' #38749
- tests added / passed
- Ensure all linting tests pass, see here for how to run them
- whatsnew entry
doc/source/whatsnew/v1.3.0.rst
Outdated
@@ -425,6 +425,7 @@ ExtensionArray | |||
|
|||
- Bug in :meth:`DataFrame.where` when ``other`` is a :class:`Series` with :class:`ExtensionArray` dtype (:issue:`38729`) | |||
- Fixed bug where :meth:`Series.idxmax`, :meth:`Series.idxmin` and ``argmax/min`` fail when the underlying data is :class:`ExtensionArray` (:issue:`32749`, :issue:`33719`, :issue:`36566`) | |||
- Bug in :class:`FloatingArray` raises ``TypeError`` when trying to apply a negative operator (:issue:`38749`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this could maybe go under enhancements since you're basically implementing the negation op
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
moved the whatsnew entry to enhancements
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
testing comment, otherwise lgtm. ping on green.
pandas/conftest.py
Outdated
and float ea dtypes. | ||
|
||
* 'Int8' | ||
* 'Int16' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i would rather have any_nullable_real_dtype (e..g. include unsigned ints), this seems an oddball.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i don't think unsigned ints are supposed to be used here. It wouldn't make sense to use unary operators like +
and -
on unsigned dtypes right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and that's exatly the reason to test this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pos and abs should for sure work and give consistent results. uints work as well (though the results may surprise you, they are consistent with how uints work generally). so pls test these.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if that's the case, should we separate the tests for uints? I think a single test handling all real dtypes would be too big.I would split the tests into signed ints, unsigned ints, and floats.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i don't see why, the way you have your tests now it should just work
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure how the tests for uints should be implemented? For test cases where there are negative numbers in the input, you can't create a series. Should I just use pytest.raises
in these cases?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
numpy gives a result so this will as well (it wraps around but that's irrelevant)
pandas/conftest.py
Outdated
and float ea dtypes. | ||
|
||
* 'Int8' | ||
* 'Int16' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and that's exatly the reason to test this
pandas/conftest.py
Outdated
and float ea dtypes. | ||
|
||
* 'Int8' | ||
* 'Int16' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pos and abs should for sure work and give consistent results. uints work as well (though the results may surprise you, they are consistent with how uints work generally). so pls test these.
pandas/core/arrays/floating.py
Outdated
@@ -257,6 +257,15 @@ def __init__(self, values: np.ndarray, mask: np.ndarray, copy: bool = False): | |||
) | |||
super().__init__(values, mask, copy=copy) | |||
|
|||
def __neg__(self): | |||
return type(self)(-self._data, self._mask) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should return a copy of the mask, xref #39943
should probably move the integer implementations (after fixed #39943 (comment)) to the base clase.
pandas/core/arrays/floating.py
Outdated
|
||
def __pos__(self): | ||
return self | ||
return self.copy() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is inconsistent with IntegerArray, although it's not clear to me what the correct return type should be. I think we return a copy for a Series, and a new object with the same backing data for TimedeltaArray. numpy returns a new object. python returns the same object with scalars.
maybe just match IntegerArray for now so that we can move these methods to the base class
@zitorelova will want to rebase this after #39971 (merging soon). just to ensure copied masks. |
36ed0bf
to
3e67655
Compare
pandas/core/arrays/floating.py
Outdated
return type(self)(-self._data, self._mask.copy()) | ||
|
||
def __pos__(self): | ||
return self |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shoulnt' this be the same? (e.g. self.copy()) to preserve semantics
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@simonjayhawkins noted that we should be consistent with the IntegerArray
class
This is inconsistent with IntegerArray, although it's not clear to me what the correct return type should be. I think we return a copy for a Series, and a new object with the same backing data for TimedeltaArray. numpy returns a new object. python returns the same object with scalars.
maybe just match IntegerArray for now so that we can move these methods to the base class
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zitorelova thanks for working on this!
#39971 added a test to tests/arrays/masked/test_arithmetic.py
for checking the mask is no longer shared. Can you update that test to also add float dtype? (you can use the data
fixture defined in that file for this, which already includes that)
pandas/core/arrays/floating.py
Outdated
@@ -257,6 +257,15 @@ def __init__(self, values: np.ndarray, mask: np.ndarray, copy: bool = False): | |||
) | |||
super().__init__(values, mask, copy=copy) | |||
|
|||
def __neg__(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think @simonjayhawkins already mentioned it as well, but can you move those definitions you added here to the base class NumericArray
in arrays/numeric.py
instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've moved the operator definitions to arrays/numeric.py
Hello @zitorelova! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found: There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻 Comment last updated at 2021-02-27 18:31:34 UTC |
throwing errors with stata files |
fixed in #40094. restarted ci. if not passing with restart, will need to merge master. |
s = pd.Series(values, dtype=dtype) | ||
data, _ = data | ||
if data.dtype in ["Float32", "Float64"] and op == "__invert__": | ||
pytest.skip("invert is not implemented for float ea dtypes") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe change this to add an xfail marker instead of skipping
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Set to xfail
lgtm ping on green. |
checks complete @jreback |
Thanks @zitorelova |