Skip to content

BUG: time strings cast to ArrowDtype with pa.time64 type #56490

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Dec 14, 2023

Conversation

mroeschke
Copy link
Member

@mroeschke mroeschke commented Dec 13, 2023

@mroeschke mroeschke added Dtype Conversions Unexpected or buggy dtype conversions Arrow pyarrow functionality labels Dec 13, 2023
@mroeschke mroeschke added this to the 2.2 milestone Dec 13, 2023
pa.ArrowTypeError,
pa.ArrowNotImplementedError,
):
if pa.types.is_string(pa_array.type) or pa.types.is_large_string(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looking at the pa.types namespace i also see is_unicode, is_large_unicode. should either of those be handled here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the docs, these are aliases of the above so I think it's OK not to include

pa_array.type
):
return cls._from_sequence_of_strings(
value, dtype=pa_type
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at _from_sequence_of_strings, this seems like it would make a lot more sense for the implementation to live in _from_sequence and that to call this rather than vice-versa (likely for all dtypes not just time, but haven't looked that closely)

That said i dont think there's anything wrong with this implementation

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I think even further the logic of _from_sequence_of_strings should be in _box_pa_array (which is used in fillna, comparison ops) which can be done in a followup

string_times = ["11:41:43.076160"]
result = pd.Series(string_times, dtype="time64[us][pyarrow]")
expected = pd.Series(
ArrowExtensionArray(pa.array(to_time(string_times), from_pandas=True))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you make the to_time result more explicit; i dont think to_time is well-maintained and dont particularly trust it

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea. Using a datetime.time object here makes more sense to test

@jbrockmendel
Copy link
Member

One test request, otherwise LGTM

@phofl phofl merged commit 6399982 into pandas-dev:main Dec 14, 2023
@phofl
Copy link
Member

phofl commented Dec 14, 2023

thx @mroeschke

@mroeschke mroeschke deleted the bug/str_to_time/arrow branch December 14, 2023 23:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Arrow pyarrow functionality Dtype Conversions Unexpected or buggy dtype conversions
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: Unsupported cast from string to time64 with pandas 2.1.4
3 participants