Skip to content

BUG: rank is not supported for double[pyarrow] dtype #51996

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
2 of 3 tasks
dylan-lee94 opened this issue Mar 15, 2023 · 3 comments
Closed
2 of 3 tasks

BUG: rank is not supported for double[pyarrow] dtype #51996

dylan-lee94 opened this issue Mar 15, 2023 · 3 comments
Labels
Arrow pyarrow functionality Bug Closing Candidate May be closeable, needs more eyeballs Groupby

Comments

@dylan-lee94
Copy link

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd
import numpy as np
pd.options.mode.dtype_backend = 'pyarrow'

df = pd.DataFrame({
    'tags': pd.Series([1,1,1,2,2,2,3,3,3,4,4,4,5,5,5],dtype='int64[pyarrow]'),
    'value': pd.Series(np.random.rand(15),dtype='double[pyarrow]')
    })

result = df['value'].rank()
result = df.groupby('tags')['value'].rank()

Issue Description

When grouping a dataframe and applying the rank function on a column with data type double[pyarrow] I get the following error:
TypeError: rank is not supported for double[pyarrow] dtype

However, applying the rank function without groupby works. This leads me to believe that the error message is misleading and that in fact the rank function does support data type double[pyarrow].

Expected Behavior

The rank function works in combination with groupby for data type double[pyarrow].

Installed Versions

Replace this line with the output of pd.show_versions()

@dylan-lee94 dylan-lee94 added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Mar 15, 2023
@phofl phofl added Groupby Arrow pyarrow functionality and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Mar 15, 2023
@mroeschke
Copy link
Member

Thanks for the report.

Looks like for rank specifically we prevent casting to the original data type, probably because rank should be numeric and we don't want to cast the type back to a non-numeric type.

if self.how in self.cast_blocklist:
# i.e. how in ["rank"], since other cast_blocklist methods don't go
# through cython_operation
return res_values

In the short term, maybe an exception should be make here if the original dtype isn't numeric? In the longer term, I think we'll integrate a way to dispatch to pyarrrows groupby aggregation methods.

@topper-123
Copy link
Contributor

This looks fixed:

import pandas as pd
import numpy as np
# pd.options.mode.dtype_backend = 'pyarrow'

df = pd.DataFrame({
    'tags': pd.Series([1,1,1,2,2,2,3,3,3,4,4,4,5,5,5],dtype='int64[pyarrow]'),
    'value': pd.Series(np.random.rand(15),dtype='double[pyarrow]')
    })

result = df['value'].rank()
result = df.groupby('tags')['value'].rank()
print(result)

with the result being:

0     3.0
1     2.0
2     1.0
3     2.0
4     1.0
5     3.0
6     1.0
7     2.0
8     3.0
9     3.0
10    1.0
11    2.0
12    1.0
13    3.0
14    2.0
Name: value, dtype: double[pyarrow]

Before I close it, do you agree this works as intended now?

@topper-123 topper-123 added the Closing Candidate May be closeable, needs more eyeballs label May 14, 2023
@topper-123
Copy link
Contributor

Ok, I'm closing, you can open again if you see something that needs to be discussed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Arrow pyarrow functionality Bug Closing Candidate May be closeable, needs more eyeballs Groupby
Projects
None yet
Development

No branches or pull requests

4 participants