Skip to content

Commit 7a0aa9f

Browse files
phofllukemanley
andauthored
Backport PR #52843 on branch 2.0.x (BUG: pyarrow duration arrays constructed from data containing NaT can overflow) (#52869)
BUG: pyarrow duration arrays constructed from data containing NaT can overflow (#52843) (cherry picked from commit 4539f3e) Co-authored-by: Luke Manley <[email protected]>
1 parent 3af68dc commit 7a0aa9f

File tree

3 files changed

+24
-1
lines changed

3 files changed

+24
-1
lines changed

doc/source/whatsnew/v2.0.1.rst

+1
Original file line numberDiff line numberDiff line change
@@ -28,6 +28,7 @@ Bug fixes
2828
~~~~~~~~~
2929
- Bug in :attr:`Series.dt.days` that would overflow ``int32`` number of days (:issue:`52391`)
3030
- Bug in :class:`arrays.DatetimeArray` constructor returning an incorrect unit when passed a non-nanosecond numpy datetime array (:issue:`52555`)
31+
- Bug in :class:`~arrays.ArrowExtensionArray` with duration dtype overflowing when constructed from data containing numpy ``NaT`` (:issue:`52843`)
3132
- Bug in :func:`Series.dt.round` when passing a ``freq`` of equal or higher resolution compared to the :class:`Series` would raise a ``ZeroDivisionError`` (:issue:`52761`)
3233
- Bug in :func:`Series.median` with :class:`ArrowDtype` returning an approximate median (:issue:`52679`)
3334
- Bug in :func:`api.interchange.from_dataframe` was unnecessarily raising on categorical dtypes (:issue:`49889`)

pandas/core/arrays/arrow/array.py

+7-1
Original file line numberDiff line numberDiff line change
@@ -260,7 +260,13 @@ def _from_sequence(cls, scalars, *, dtype: Dtype | None = None, copy: bool = Fal
260260
scalars = pa.array(scalars, from_pandas=True)
261261
if pa_dtype:
262262
scalars = scalars.cast(pa_dtype)
263-
return cls(scalars)
263+
arr = cls(scalars)
264+
if pa.types.is_duration(scalars.type) and scalars.null_count > 0:
265+
# GH52843: upstream bug for duration types when originally
266+
# constructed with data containing numpy NaT.
267+
# https://github.com/apache/arrow/issues/35088
268+
arr = arr.fillna(arr.dtype.na_value)
269+
return arr
264270

265271
@classmethod
266272
def _from_sequence_of_strings(

pandas/tests/extension/test_arrow.py

+16
Original file line numberDiff line numberDiff line change
@@ -2629,3 +2629,19 @@ def test_describe_numeric_data(pa_type):
26292629
index=["count", "mean", "std", "min", "25%", "50%", "75%", "max"],
26302630
)
26312631
tm.assert_series_equal(result, expected)
2632+
2633+
2634+
@pytest.mark.xfail(
2635+
pa_version_under8p0,
2636+
reason="Function 'add_checked' has no kernel matching input types",
2637+
raises=pa.ArrowNotImplementedError,
2638+
)
2639+
def test_duration_overflow_from_ndarray_containing_nat():
2640+
# GH52843
2641+
data_ts = pd.to_datetime([1, None])
2642+
data_td = pd.to_timedelta([1, None])
2643+
ser_ts = pd.Series(data_ts, dtype=ArrowDtype(pa.timestamp("ns")))
2644+
ser_td = pd.Series(data_td, dtype=ArrowDtype(pa.duration("ns")))
2645+
result = ser_ts + ser_td
2646+
expected = pd.Series([2, None], dtype=ArrowDtype(pa.timestamp("ns")))
2647+
tm.assert_series_equal(result, expected)

0 commit comments

Comments
 (0)