Skip to content

Return type of pow #174

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
MarcoGorelli opened this issue May 24, 2023 · 5 comments · Fixed by #182
Closed

Return type of pow #174

MarcoGorelli opened this issue May 24, 2023 · 5 comments · Fixed by #182

Comments

@MarcoGorelli
Copy link
Contributor

Seems like there's some inconsistencies:

In [6]: pd.Series([1,2,3])**2
Out[6]:
0    1
1    4
2    9
dtype: int64

In [7]: pd.Series([1,2,3])**-1
---------------------------------------------------------------------------
ValueError: Integers to negative integer powers are not allowed.

In [8]: pl.Series([1,2,3]).pow(2)
Out[8]:
shape: (3,)
Series: '' [f64]
[
        1.0
        4.0
        9.0
]

In [9]: pl.Series([1,2,3]).pow(-1)
Out[9]:
shape: (3,)
Series: '' [f64]
[
        1.0
        0.5
        0.333333
]

polars always returns floats, whereas pandas returns either integers or floats, and may error based on the value of the exponent

What do we want to do here?

@rgommers
Copy link
Member

I think we should refer to the array API standard's description for pow. It's admittedly a bit hairy, but for any numerical behavior like this I think array libraries have thought about this a lot harder than dataframe libraries, and we should not reinvent this particular wheel.

For the examples given, that spec says, for col with integer dtype:

  • col**2 should given integer dtype result
  • col**(-1) is implementation-defined and may not be allowed

In this particular case I think the Polars choice isn't completely unreasonable, because it's the other choice that could be made to extrapolate Python's builtin behavior for scalars to a column:

>>> 2**2
4
>>> 2**-1
0.5
>>> type(2**2)
<class 'int'>
>>> type(2**-1)
<class 'float'>

But it's the opposite choice made by all array libraries and by Pandas, and makes it harder to work with lower-precision dtypes when everything ends up being float64. So it's not an ideal choice either and I'm hoping it can still be reversed.

>>> import polars as pl
>>> col = pl.Series([1, 2, 3], dtype=pl.Int32)
>>> col**2
shape: (3,)
Series: '' [f64]
[
        1.0
        4.0
        9.0
]
>>> col.pow(2).dtype
Float64
>>> (col**2).dtype
Float64

I don't see any documented casting rules in the Polars docs, although it seems dtype-preserving in general with pow being an exception, and there is manual casting support.

I haven't checked all the other dataframe libraries yet, would be good to check that first.

If it's not possible to make the pow behavior uniform, I think the other option is to recommend the array API standard behavior but not make it mandatory.

@MarcoGorelli
Copy link
Contributor Author

If it's not possible to make the pow behavior uniform

We can always work around this in the standard, no big deal - following the Array API looks good to me

@jorisvandenbossche
Copy link
Member

jorisvandenbossche commented May 25, 2023

As another data point: the pyarrow.compute power kernel for integer data and integer exponent preserves the integer dtype for positive integers, and raises an error for negative integers. So that should be compatible with the Array API specification.

@MarcoGorelli
Copy link
Contributor Author

thanks @jorisvandenbossche ! I like that, I'd suggest standardising to that

@MarcoGorelli
Copy link
Contributor Author

in the last call we went for following the pyarrow behaviour

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants