-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: time strings cast to ArrowDtype with pa.time64 type #56490
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
pa.ArrowTypeError, | ||
pa.ArrowNotImplementedError, | ||
): | ||
if pa.types.is_string(pa_array.type) or pa.types.is_large_string( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looking at the pa.types namespace i also see is_unicode, is_large_unicode. should either of those be handled here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to the docs, these are aliases of the above so I think it's OK not to include
pa_array.type | ||
): | ||
return cls._from_sequence_of_strings( | ||
value, dtype=pa_type |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at _from_sequence_of_strings, this seems like it would make a lot more sense for the implementation to live in _from_sequence and that to call this rather than vice-versa (likely for all dtypes not just time, but haven't looked that closely)
That said i dont think there's anything wrong with this implementation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I think even further the logic of _from_sequence_of_strings
should be in _box_pa_array
(which is used in fillna, comparison ops) which can be done in a followup
pandas/tests/extension/test_arrow.py
Outdated
string_times = ["11:41:43.076160"] | ||
result = pd.Series(string_times, dtype="time64[us][pyarrow]") | ||
expected = pd.Series( | ||
ArrowExtensionArray(pa.array(to_time(string_times), from_pandas=True)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you make the to_time result more explicit; i dont think to_time is well-maintained and dont particularly trust it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea. Using a datetime.time
object here makes more sense to test
One test request, otherwise LGTM |
thx @mroeschke |
doc/source/whatsnew/vX.X.X.rst
file if fixing a bug or adding a new feature.