Skip to content

REGR: to_datetime with non-ISO format, float, and nan fails on main #50238

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Dec 15, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions doc/source/whatsnew/v1.5.3.rst
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ Fixed regressions
- Enforced reversion of ``color`` as an alias for ``c`` and ``size`` as an alias for ``s`` in function :meth:`DataFrame.plot.scatter` (:issue:`49732`)
- Fixed regression in :meth:`SeriesGroupBy.apply` setting a ``name`` attribute on the result if the result was a :class:`DataFrame` (:issue:`49907`)
- Fixed performance regression in setting with the :meth:`~DataFrame.at` indexer (:issue:`49771`)
- Fixed regression in :func:`to_datetime` raising ``ValueError`` when parsing array of ``float`` containing ``np.nan`` (:issue:`50237`)
-

.. ---------------------------------------------------------------------------
Expand Down
12 changes: 11 additions & 1 deletion pandas/_libs/tslibs/strptime.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,11 @@ from pandas._libs.tslibs.np_datetime cimport (
pydatetime_to_dt64,
)
from pandas._libs.tslibs.timestamps cimport _Timestamp
from pandas._libs.util cimport is_datetime64_object
from pandas._libs.util cimport (
is_datetime64_object,
is_float_object,
is_integer_object,
)

cnp.import_array()

Expand Down Expand Up @@ -185,6 +189,12 @@ def array_strptime(
elif is_datetime64_object(val):
iresult[i] = get_datetime64_nanos(val, NPY_FR_ns)
continue
elif (
(is_integer_object(val) or is_float_object(val))
and (val != val or val == NPY_NAT)
):
iresult[i] = NPY_NAT
Comment on lines +192 to +196
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

matches

elif is_integer_object(val) or is_float_object(val):
if val != val or val == NPY_NAT:
iresult[i] = NPY_NAT

continue
else:
val = str(val)

Expand Down
11 changes: 11 additions & 0 deletions pandas/tests/tools/test_to_datetime.py
Original file line number Diff line number Diff line change
Expand Up @@ -135,6 +135,17 @@ def test_to_datetime_format_YYYYMMDD_with_nat(self, cache):
result = to_datetime(ser2, format="%Y%m%d", cache=cache)
tm.assert_series_equal(result, expected)

def test_to_datetime_format_YYYYMM_with_nat(self, cache):
# https://github.com/pandas-dev/pandas/issues/50237
ser = Series([198012, 198012] + [198101] * 5)
expected = Series(
[Timestamp("19801201"), Timestamp("19801201")] + [Timestamp("19810101")] * 5
)
expected[2] = np.nan
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If np.nan was pd.datetime64('NaT') would this hit the elif block added in this PR (and if so could you add a test for it)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hey - no, that would hit this branch:

elif checknull_with_nat_and_na(val):
iresult[i] = NPY_NAT
continue

, which is covered by a bunch of tests, e.g.

@pytest.mark.parametrize(
"input_s",
[
# Null values with Strings
["19801222", "20010112", None],
["19801222", "20010112", np.nan],
["19801222", "20010112", NaT],
["19801222", "20010112", "NaT"],
# Null values with Integers
[19801222, 20010112, None],
[19801222, 20010112, np.nan],
[19801222, 20010112, NaT],
[19801222, 20010112, "NaT"],
],
)
def test_to_datetime_format_YYYYMMDD_with_none(self, input_s):
# GH 30011
# format='%Y%m%d'
# with None
expected = Series([Timestamp("19801222"), Timestamp("20010112"), NaT])
result = Series(to_datetime(input_s, format="%Y%m%d"))
tm.assert_series_equal(result, expected)

ser[2] = np.nan
result = to_datetime(ser, format="%Y%m", cache=cache)
tm.assert_series_equal(result, expected)

def test_to_datetime_format_YYYYMMDD_ignore(self, cache):
# coercion
# GH 7930
Expand Down