Skip to content

BUG: time strings cast to ArrowDtype with pa.time64 type #56490

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Dec 14, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion doc/source/whatsnew/v2.2.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -697,7 +697,7 @@ Other
- Bug in rendering ``inf`` values inside a a :class:`DataFrame` with the ``use_inf_as_na`` option enabled (:issue:`55483`)
- Bug in rendering a :class:`Series` with a :class:`MultiIndex` when one of the index level's names is 0 not having that name displayed (:issue:`55415`)
- Bug in the error message when assigning an empty dataframe to a column (:issue:`55956`)
-
- Bug when time-like strings were being cast to :class:`ArrowDtype` with ``pyarrow.time64`` type (:issue:`56463`)

.. ***DO NOT USE THIS SECTION***

Expand Down
18 changes: 17 additions & 1 deletion pandas/core/arrays/arrow/array.py
Original file line number Diff line number Diff line change
Expand Up @@ -493,7 +493,23 @@ def _box_pa_array(
if pa.types.is_dictionary(pa_type):
pa_array = pa_array.dictionary_encode()
else:
pa_array = pa_array.cast(pa_type)
try:
pa_array = pa_array.cast(pa_type)
except (
pa.ArrowInvalid,
pa.ArrowTypeError,
pa.ArrowNotImplementedError,
):
if pa.types.is_string(pa_array.type) or pa.types.is_large_string(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looking at the pa.types namespace i also see is_unicode, is_large_unicode. should either of those be handled here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the docs, these are aliases of the above so I think it's OK not to include

pa_array.type
):
# TODO: Move logic in _from_sequence_of_strings into
# _box_pa_array
return cls._from_sequence_of_strings(
value, dtype=pa_type
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at _from_sequence_of_strings, this seems like it would make a lot more sense for the implementation to live in _from_sequence and that to call this rather than vice-versa (likely for all dtypes not just time, but haven't looked that closely)

That said i dont think there's anything wrong with this implementation

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I think even further the logic of _from_sequence_of_strings should be in _box_pa_array (which is used in fillna, comparison ops) which can be done in a followup

)._pa_array
else:
raise

return pa_array

Expand Down
10 changes: 10 additions & 0 deletions pandas/tests/extension/test_arrow.py
Original file line number Diff line number Diff line change
Expand Up @@ -3051,3 +3051,13 @@ def test_string_to_datetime_parsing_cast():
ArrowExtensionArray(pa.array(pd.to_datetime(string_dates), from_pandas=True))
)
tm.assert_series_equal(result, expected)


def test_string_to_time_parsing_cast():
# GH 56463
string_times = ["11:41:43.076160"]
result = pd.Series(string_times, dtype="time64[us][pyarrow]")
expected = pd.Series(
ArrowExtensionArray(pa.array([time(11, 41, 43, 76160)], from_pandas=True))
)
tm.assert_series_equal(result, expected)