Skip to content

ENH: Implement unary operators for FloatingArray class #39916

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 23 commits into from
Feb 28, 2021

Conversation

zitorelova
Copy link
Contributor

@@ -425,6 +425,7 @@ ExtensionArray

- Bug in :meth:`DataFrame.where` when ``other`` is a :class:`Series` with :class:`ExtensionArray` dtype (:issue:`38729`)
- Fixed bug where :meth:`Series.idxmax`, :meth:`Series.idxmin` and ``argmax/min`` fail when the underlying data is :class:`ExtensionArray` (:issue:`32749`, :issue:`33719`, :issue:`36566`)
- Bug in :class:`FloatingArray` raises ``TypeError`` when trying to apply a negative operator (:issue:`38749`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this could maybe go under enhancements since you're basically implementing the negation op

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved the whatsnew entry to enhancements

@zitorelova zitorelova changed the title BUG TypeError when using negative operator on FloatingArray ENH Implement unary operators for FloatingArray class Feb 19, 2021
@zitorelova zitorelova changed the title ENH Implement unary operators for FloatingArray class ENH: Implement unary operators for FloatingArray class Feb 19, 2021
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

testing comment, otherwise lgtm. ping on green.

and float ea dtypes.

* 'Int8'
* 'Int16'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i would rather have any_nullable_real_dtype (e..g. include unsigned ints), this seems an oddball.

Copy link
Contributor Author

@zitorelova zitorelova Feb 21, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't think unsigned ints are supposed to be used here. It wouldn't make sense to use unary operators like + and - on unsigned dtypes right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and that's exatly the reason to test this

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pos and abs should for sure work and give consistent results. uints work as well (though the results may surprise you, they are consistent with how uints work generally). so pls test these.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if that's the case, should we separate the tests for uints? I think a single test handling all real dtypes would be too big.I would split the tests into signed ints, unsigned ints, and floats.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't see why, the way you have your tests now it should just work

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how the tests for uints should be implemented? For test cases where there are negative numbers in the input, you can't create a series. Should I just use pytest.raises in these cases?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

numpy gives a result so this will as well (it wraps around but that's irrelevant)

@jreback jreback added Enhancement ExtensionArray Extending pandas with custom dtypes or arrays. labels Feb 21, 2021
@jreback jreback added this to the 1.3 milestone Feb 21, 2021
and float ea dtypes.

* 'Int8'
* 'Int16'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and that's exatly the reason to test this

and float ea dtypes.

* 'Int8'
* 'Int16'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pos and abs should for sure work and give consistent results. uints work as well (though the results may surprise you, they are consistent with how uints work generally). so pls test these.

@@ -257,6 +257,15 @@ def __init__(self, values: np.ndarray, mask: np.ndarray, copy: bool = False):
)
super().__init__(values, mask, copy=copy)

def __neg__(self):
return type(self)(-self._data, self._mask)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should return a copy of the mask, xref #39943

should probably move the integer implementations (after fixed #39943 (comment)) to the base clase.


def __pos__(self):
return self
return self.copy()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is inconsistent with IntegerArray, although it's not clear to me what the correct return type should be. I think we return a copy for a Series, and a new object with the same backing data for TimedeltaArray. numpy returns a new object. python returns the same object with scalars.

maybe just match IntegerArray for now so that we can move these methods to the base class

@jreback
Copy link
Contributor

jreback commented Feb 22, 2021

@zitorelova will want to rebase this after #39971 (merging soon). just to ensure copied masks.

return type(self)(-self._data, self._mask.copy())

def __pos__(self):
return self
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shoulnt' this be the same? (e.g. self.copy()) to preserve semantics

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@simonjayhawkins noted that we should be consistent with the IntegerArray class

This is inconsistent with IntegerArray, although it's not clear to me what the correct return type should be. I think we return a copy for a Series, and a new object with the same backing data for TimedeltaArray. numpy returns a new object. python returns the same object with scalars.

maybe just match IntegerArray for now so that we can move these methods to the base class

Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zitorelova thanks for working on this!

#39971 added a test to tests/arrays/masked/test_arithmetic.py for checking the mask is no longer shared. Can you update that test to also add float dtype? (you can use the data fixture defined in that file for this, which already includes that)

@@ -257,6 +257,15 @@ def __init__(self, values: np.ndarray, mask: np.ndarray, copy: bool = False):
)
super().__init__(values, mask, copy=copy)

def __neg__(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think @simonjayhawkins already mentioned it as well, but can you move those definitions you added here to the base class NumericArray in arrays/numeric.py instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've moved the operator definitions to arrays/numeric.py

@pep8speaks
Copy link

pep8speaks commented Feb 26, 2021

Hello @zitorelova! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2021-02-27 18:31:34 UTC

@zitorelova
Copy link
Contributor Author

throwing errors with stata files

@simonjayhawkins
Copy link
Member

throwing errors with stata files

fixed in #40094. restarted ci. if not passing with restart, will need to merge master.

s = pd.Series(values, dtype=dtype)
data, _ = data
if data.dtype in ["Float32", "Float64"] and op == "__invert__":
pytest.skip("invert is not implemented for float ea dtypes")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe change this to add an xfail marker instead of skipping

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Set to xfail

@jreback
Copy link
Contributor

jreback commented Feb 27, 2021

lgtm ping on green.

@zitorelova
Copy link
Contributor Author

checks complete @jreback

@simonjayhawkins simonjayhawkins merged commit 4c5e6fa into pandas-dev:master Feb 28, 2021
@simonjayhawkins
Copy link
Member

Thanks @zitorelova

@zitorelova zitorelova deleted the float-unary-bug branch February 28, 2021 15:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement ExtensionArray Extending pandas with custom dtypes or arrays.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: TypeError: bad operand type for unary -: 'FloatingArray'
6 participants