Skip to content

BUG: fixed OutOfBoundsDatetime exception when errors=coerce #45319 #47794

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Aug 15, 2022
2 changes: 1 addition & 1 deletion doc/source/whatsnew/v1.5.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -827,7 +827,7 @@ Datetimelike
- Bug in :meth:`DatetimeIndex.resolution` incorrectly returning "day" instead of "nanosecond" for nanosecond-resolution indexes (:issue:`46903`)
- Bug in :class:`Timestamp` with an integer or float value and ``unit="Y"`` or ``unit="M"`` giving slightly-wrong results (:issue:`47266`)
- Bug in :class:`.DatetimeArray` construction when passed another :class:`.DatetimeArray` and ``freq=None`` incorrectly inferring the freq from the given array (:issue:`47296`)
-
- Bug in :func:`to_datetime` where ``OutOfBoundsDatetime`` would be thrown even if ``errors=coerce`` if there were more than 50 rows (:issue:`45319`)

Timedelta
^^^^^^^^^
Expand Down
6 changes: 5 additions & 1 deletion pandas/core/tools/datetimes.py
Original file line number Diff line number Diff line change
Expand Up @@ -227,7 +227,11 @@ def _maybe_cache(
unique_dates = unique(arg)
if len(unique_dates) < len(arg):
cache_dates = convert_listlike(unique_dates, format)
cache_array = Series(cache_dates, index=unique_dates)
# GH#45319
try:
cache_array = Series(cache_dates, index=unique_dates)
except OutOfBoundsDatetime:
pass
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sense just to return cache_array here since the cache_array cannot be used due to the out of bounds date?

# GH#39882 and GH#35888 in case of None and NaT we get duplicates
if not cache_array.index.is_unique:
cache_array = cache_array[~cache_array.index.duplicated()]
Expand Down
19 changes: 19 additions & 0 deletions pandas/tests/tools/test_to_datetime.py
Original file line number Diff line number Diff line change
Expand Up @@ -2773,3 +2773,22 @@ def test_to_datetime_monotonic_increasing_index(cache):
result = to_datetime(times.iloc[:, 0], cache=cache)
expected = times.iloc[:, 0]
tm.assert_series_equal(result, expected)


@pytest.mark.parametrize(
"series_length",
[40, start_caching_at, (start_caching_at + 1), (start_caching_at + 5)],
)
def test_to_datetime_cache_coerce_50_lines(series_length):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def test_to_datetime_cache_coerce_50_lines(series_length):
def test_to_datetime_cache_coerce_50_lines_outofbounds(series_length):

Also could you test the ignore and raise options as well?

# GH#45319
s = Series(
[datetime.fromisoformat("1446-04-12 00:00:00+00:00")]
+ ([datetime.fromisoformat("1991-10-20 00:00:00+00:00")] * series_length)
)
result = to_datetime(s, errors="coerce", utc=True)

expected = Series(
[NaT] + ([Timestamp("1991-10-20 00:00:00+00:00")] * series_length)
)

tm.assert_series_equal(result, expected)