-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
ENH: support reductions for pyarrow temporal types #50998
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: support reductions for pyarrow temporal types #50998
Conversation
pandas/core/arrays/arrow/array.py
Outdated
result = result.cast(pa_type) | ||
elif pa.types.is_time(pa_type): | ||
# TODO: with time types we should probably retain "unit", | ||
# but not clear how to get that since pa_type.unit raises |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this on an older version of pyarrow?
In [6]: t = pa.time32("ms")
In [7]: t.unit
Out[7]: 'ms'
In [8]: pa.__version__
Out[8]: '10.0.1'
In [9]: t = pa.time64("ns")
In [10]: t.unit
Out[10]: 'ns'
In [11]: t = pa.time32("ms")
In [12]: t.unit
Out[12]: 'ms'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks like it depends on exactly what we're checking. Here pa_type is defined as pa_type = self._data.type
which DataType(time32[ms])
, which does not have a .unit attribute in 10.0.1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so how do I get the Time32Type(time32[ms])
instead of the DataType(time32[ms])
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I see. Yeah I don't see a way to do that. Any insight @jorisvandenbossche?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this was only recently fixed (should be in pyarrow 11): apache/arrow#14633
When getting the type from the array, it wasn't correctly boxed in the Time specific class, and so the unit attribute is not available. I am not directly sure of a workaround for older pyarrow versions, except for parsing that from the str repr of the type ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so something like?
def get_unit_from_pyarrow_dtype(pa_dtype):
if pa_lt_11:
return str(pa_dtype).split("[")[-1][:-1]
return pa_dtype.unit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that should work I think
Thanks @jbrockmendel |
We could plausibly disallow mean for date types, matching what we do for PeriodDtype.
Fixes 232 xfails, cuts test runtime by 88 seconds.