Skip to content

ENH: support reductions for pyarrow temporal types #50998

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Feb 9, 2023

Conversation

jbrockmendel
Copy link
Member

We could plausibly disallow mean for date types, matching what we do for PeriodDtype.

Fixes 232 xfails, cuts test runtime by 88 seconds.

@jbrockmendel jbrockmendel added the Arrow pyarrow functionality label Jan 26, 2023
result = result.cast(pa_type)
elif pa.types.is_time(pa_type):
# TODO: with time types we should probably retain "unit",
# but not clear how to get that since pa_type.unit raises
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this on an older version of pyarrow?

In [6]: t = pa.time32("ms")

In [7]: t.unit
Out[7]: 'ms'

In [8]: pa.__version__
Out[8]: '10.0.1'

In [9]: t = pa.time64("ns")

In [10]: t.unit
Out[10]: 'ns'

In [11]: t = pa.time32("ms")

In [12]: t.unit
Out[12]: 'ms'

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like it depends on exactly what we're checking. Here pa_type is defined as pa_type = self._data.type which DataType(time32[ms]), which does not have a .unit attribute in 10.0.1

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so how do I get the Time32Type(time32[ms]) instead of the DataType(time32[ms]) here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see. Yeah I don't see a way to do that. Any insight @jorisvandenbossche?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this was only recently fixed (should be in pyarrow 11): apache/arrow#14633

When getting the type from the array, it wasn't correctly boxed in the Time specific class, and so the unit attribute is not available. I am not directly sure of a workaround for older pyarrow versions, except for parsing that from the str repr of the type ...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so something like?

def get_unit_from_pyarrow_dtype(pa_dtype):
    if pa_lt_11:
        return str(pa_dtype).split("[")[-1][:-1]
    return pa_dtype.unit

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that should work I think

@mroeschke mroeschke added this to the 2.0 milestone Feb 9, 2023
@mroeschke mroeschke merged commit 52306d9 into pandas-dev:main Feb 9, 2023
@mroeschke
Copy link
Member

Thanks @jbrockmendel

@jbrockmendel jbrockmendel deleted the enh-pyarrow-reductions-4 branch February 9, 2023 17:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Arrow pyarrow functionality
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants