Skip to content

Commit d93d835

Browse files
committed
BUG: fix TypeError for invalid integer dates %Y%m%d with errors='ignore' (# GH 26583)
array_strptime returned TypeError when trying to slice 'too long' integer for the given format %Y%m%d (for example 2121010101). After parsing date in the first 8 symbols it tried to return the remaining symbols in ValueError message as a slice of integer which in turn caused TypeError. Converted to string value is now used to make slice for that ValueError message. In case of 20209911, it tried to parse 20209911 to datetime(2020, 9, 9) and had 11 unparsed.
1 parent 8154efb commit d93d835

File tree

3 files changed

+23
-3
lines changed

3 files changed

+23
-3
lines changed

doc/source/whatsnew/v0.25.0.rst

+1
Original file line numberDiff line numberDiff line change
@@ -427,6 +427,7 @@ Datetimelike
427427
- Bug in :class:`Series` and :class:`DataFrame` repr where ``np.datetime64('NaT')`` and ``np.timedelta64('NaT')`` with ``dtype=object`` would be represented as ``NaN`` (:issue:`25445`)
428428
- Bug in :func:`to_datetime` which does not replace the invalid argument with ``NaT`` when error is set to coerce (:issue:`26122`)
429429
- Bug in adding :class:`DateOffset` with nonzero month to :class:`DatetimeIndex` would raise ``ValueError`` (:issue:`26258`)
430+
- Bug in :func:`to_datetime` which raises ``TypeError`` for ``format='%Y%m%d'`` when called for invalid integer dates with length >= 6 digits with ``errors='ignore'``
430431

431432
Timedelta
432433
^^^^^^^^^

pandas/_libs/tslibs/strptime.pyx

+3-3
Original file line numberDiff line numberDiff line change
@@ -140,13 +140,13 @@ def array_strptime(object[:] values, object fmt,
140140
iresult[i] = NPY_NAT
141141
continue
142142
raise ValueError("time data %r does not match "
143-
"format %r (match)" % (values[i], fmt))
143+
"format %r (match)" % (val, fmt))
144144
if len(val) != found.end():
145145
if is_coerce:
146146
iresult[i] = NPY_NAT
147147
continue
148148
raise ValueError("unconverted data remains: %s" %
149-
values[i][found.end():])
149+
val[found.end():])
150150

151151
# search
152152
else:
@@ -156,7 +156,7 @@ def array_strptime(object[:] values, object fmt,
156156
iresult[i] = NPY_NAT
157157
continue
158158
raise ValueError("time data %r does not match format "
159-
"%r (search)" % (values[i], fmt))
159+
"%r (search)" % (val, fmt))
160160

161161
iso_year = -1
162162
year = 1900

pandas/tests/indexes/datetimes/test_tools.py

+19
Original file line numberDiff line numberDiff line change
@@ -114,6 +114,25 @@ def test_to_datetime_format_integer(self, cache):
114114
result = to_datetime(s, format='%Y%m', cache=cache)
115115
assert_series_equal(result, expected)
116116

117+
@pytest.mark.parametrize('int_date, expected', [
118+
# valid date, length == 8
119+
[20121030, datetime(2012, 10, 30)],
120+
# short valid date, length == 6
121+
[199934, datetime(1999, 3, 4)],
122+
# long integer date partially parsed to datetime(2012,1,1), length > 8
123+
[2012010101, 2012010101],
124+
# invalid date partially parsed to datetime(2012,9,9), length == 8
125+
[20129930, 20129930],
126+
# short integer date partially parsed to datetime(2012,9,9), length < 8
127+
[2012993, 2012993],
128+
# short invalid date, length == 4
129+
[2121, 2121]])
130+
def test_int_to_datetime_format_YYYYMMDD_typeerror(self, int_date,
131+
expected):
132+
# GH 26583
133+
result = to_datetime(int_date, format='%Y%m%d', errors='ignore')
134+
assert result == expected
135+
117136
@pytest.mark.parametrize('cache', [True, False])
118137
def test_to_datetime_format_microsecond(self, cache):
119138

0 commit comments

Comments
 (0)