Skip to content

Default ExtensionScalarOpsMixin for divmod is incorrect #22930

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
TomAugspurger opened this issue Oct 1, 2018 · 8 comments
Closed

Default ExtensionScalarOpsMixin for divmod is incorrect #22930

TomAugspurger opened this issue Oct 1, 2018 · 8 comments
Labels
ExtensionArray Extending pandas with custom dtypes or arrays.
Milestone

Comments

@TomAugspurger
Copy link
Contributor

TomAugspurger commented Oct 1, 2018

Code Sample, a copy-pastable example if possible

In [1]: from pandas.tests.extension.decimal.array import *

In [2]: arr = DecimalArray([decimal.Decimal(1), decimal.Decimal(2), decimal.Decimal(3)])

In [3]: divmod(arr, 2)
Out[3]:
[(Decimal('0'), Decimal('1')),
 (Decimal('1'), Decimal('0')),
 (Decimal('1'), Decimal('1'))]

Expected Output

In [11]: a, b = zip(*(divmod(x, 2) for x in arr))

In [12]: DecimalArray(a), DecimalArray(b)
Out[12]:
(DecimalArray(array([Decimal('0'), Decimal('1'), Decimal('1')], dtype=object)),
 DecimalArray(array([Decimal('1'), Decimal('0'), Decimal('1')], dtype=object)))

Any others?

@TomAugspurger TomAugspurger added the ExtensionArray Extending pandas with custom dtypes or arrays. label Oct 1, 2018
@TomAugspurger TomAugspurger added this to the 0.24.0 milestone Oct 1, 2018
@TomAugspurger
Copy link
Contributor Author

So in

if coerce_to_dtype:
try:
res = self._from_sequence(res)
except TypeError:
pass
we catch a TypeError. I suppose that's necessary, because the ops we're implementing may not be an algebra (the array may not be able to hold the return type).

@Dr-Irv unreatalted to this issue, we allocate an len(self) list in

ovalues = [param] * len(self)
. Could that just be itertools.cycle(param)?

@TomAugspurger
Copy link
Contributor Author

Are there any other operators that return a tuple? rdivmod I suppose.

@Dr-Irv
Copy link
Contributor

Dr-Irv commented Oct 2, 2018

@TomAugspurger Been on vacation, so just seeing this.

In the first one, the idea is that we are allowing the result to be a different EA type, which may not be the EA type of the caller.

With respect to using itertools.cycle, I think that would work. I was unaware that existed.

@Dr-Irv
Copy link
Contributor

Dr-Irv commented Oct 2, 2018

Also, with respect to the divmod issue reported here, in pandas/tests/extension/decimal/test_decimal.py, I explicitly skipped the test for divmod because things like divmod(Decimal(5.2), Decimal(3.3)) are not necessarily well defined. (or at least that is what I remember concluding way back when)

@TomAugspurger
Copy link
Contributor Author

Thanks. divmod between two decimals seems to work fine, so we'll re-enable that test with the implementation.

In the first one, the idea is that we are allowing the result to be a different EA type, which may not be the EA type of the caller.

I'm not sure I follow. The expected result of a divmod(array, Any) should be a tuple of arrays

In [15]: divmod(np.array([1, 2, 3]), 2)
Out[15]: (array([0, 1, 1]), array([1, 0, 1]))

right now we're returning an array (list) of tuples.

@Dr-Irv
Copy link
Contributor

Dr-Irv commented Oct 4, 2018

In terms of divmod between two decimals working fine, I think the issue is how the "infinite" precision of Decimal should handle it. Here's an example:

In [1]: from decimal import Decimal

In [2]: import numpy as np

In [3]: r=np.random.randn(2)

In [4]: r
Out[4]: array([-1.42964197, -0.4781252 ])

In [5]: divmod(r[0], r[1])
Out[5]: (2.0, -0.47339156548270145)

In [6]: divmod(Decimal(r[0]), Decimal(r[1]))
Out[6]: (Decimal('2'), Decimal('-0.4733915654827014485306335700'))

And then there is this example:

In [8]: divmod(Decimal(1.0), Decimal(1.0)/Decimal(3.0))
Out[8]: (Decimal('3'), Decimal('1E-28'))

I think the result of that should be (Decimal('3'), Decimal('0.0'))

So it is not clear what the expected result should be when the input values are "infinite" decimal numbers. But maybe the test will work now if enabled.

When I wrote:

In the first one, the idea is that we are allowing the result to be a different EA type, which may not be the EA type of the caller.

I meant the following. Suppose the EA holds elements of scalar type A. Suppose that the result of the binop for the two scalars of type A is of scalar type B. Then the someone might have a different EA type holding scalars of type B. With the current try/except logic, this case is then covered.

@TomAugspurger
Copy link
Contributor Author

I think the result of that should be (Decimal('3'), Decimal('0.0'))

You think that's a bug in Python?

Regardless, the array version should be matching what the scalar version does.

Then the someone might have a different EA type holding scalars of type B. With the current try/except logic, this case is then covered

I don't see that. On master, we return a list if the self._from_sequence fails, not a different extension array.

@Dr-Irv
Copy link
Contributor

Dr-Irv commented Oct 4, 2018

I'm not sure if it is a bug in Python, but it makes writing a test for divmod in DecimalArray a challenge, so I punted.

I agree that the array version should do what the scalar version does.

I don't see that. On master, we return a list if the self._from_sequence fails, not a different extension array.

Yes, and that was intentional.

I was responding to your comment here:
#22930 (comment)
where you wrote:

we catch a TypeError. I suppose that's necessary, because the ops we're implementing may not be an algebra (the array may not be able to hold the return type).

So I think we're all good in terms of understanding this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ExtensionArray Extending pandas with custom dtypes or arrays.
Projects
None yet
Development

No branches or pull requests

2 participants