Skip to content

BUG: fixed OutOfBoundsDatetime exception when errors=coerce #45319 #47794

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Aug 15, 2022
1 change: 1 addition & 0 deletions doc/source/whatsnew/v1.5.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -897,6 +897,7 @@ Datetimelike
- Bug in :meth:`DatetimeIndex.resolution` incorrectly returning "day" instead of "nanosecond" for nanosecond-resolution indexes (:issue:`46903`)
- Bug in :class:`Timestamp` with an integer or float value and ``unit="Y"`` or ``unit="M"`` giving slightly-wrong results (:issue:`47266`)
- Bug in :class:`.DatetimeArray` construction when passed another :class:`.DatetimeArray` and ``freq=None`` incorrectly inferring the freq from the given array (:issue:`47296`)
- Bug in :func:`to_datetime` where ``OutOfBoundsDatetime`` would be thrown even if ``errors=coerce`` if there were more than 50 rows (:issue:`45319`)
- Bug when adding a :class:`DateOffset` to a :class:`Series` would not add the ``nanoseconds`` field (:issue:`47856`)
-

Expand Down
6 changes: 5 additions & 1 deletion pandas/core/tools/datetimes.py
Original file line number Diff line number Diff line change
Expand Up @@ -227,7 +227,11 @@ def _maybe_cache(
unique_dates = unique(arg)
if len(unique_dates) < len(arg):
cache_dates = convert_listlike(unique_dates, format)
cache_array = Series(cache_dates, index=unique_dates)
# GH#45319
try:
cache_array = Series(cache_dates, index=unique_dates)
except OutOfBoundsDatetime:
return cache_array
# GH#39882 and GH#35888 in case of None and NaT we get duplicates
if not cache_array.index.is_unique:
cache_array = cache_array[~cache_array.index.duplicated()]
Expand Down
31 changes: 31 additions & 0 deletions pandas/tests/tools/test_to_datetime.py
Original file line number Diff line number Diff line change
Expand Up @@ -2777,3 +2777,34 @@ def test_to_datetime_monotonic_increasing_index(cache):
result = to_datetime(times.iloc[:, 0], cache=cache)
expected = times.iloc[:, 0]
tm.assert_series_equal(result, expected)


@pytest.mark.parametrize(
"series_length",
[40, start_caching_at, (start_caching_at + 1), (start_caching_at + 5)],
)
def test_to_datetime_cache_coerce_50_lines_outofbounds(series_length):
# GH#45319
s = Series(
[datetime.fromisoformat("1446-04-12 00:00:00+00:00")]
+ ([datetime.fromisoformat("1991-10-20 00:00:00+00:00")] * series_length)
)
result1 = to_datetime(s, errors="coerce", utc=True)

expected1 = Series(
[NaT] + ([Timestamp("1991-10-20 00:00:00+00:00")] * series_length)
)

tm.assert_series_equal(result1, expected1)

result2 = to_datetime(s, errors="ignore", utc=True)

expected2 = Series(
[datetime.fromisoformat("1446-04-12 00:00:00+00:00")]
+ ([datetime.fromisoformat("1991-10-20 00:00:00+00:00")] * series_length)
)

tm.assert_series_equal(result2, expected2)

with pytest.raises(OutOfBoundsDatetime, match="Out of bounds nanosecond timestamp"):
to_datetime(s, errors="raise", utc=True)