-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: Do not fail when parsing pydatetime objects in pd.to_datetime #49893
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
author Antonio Ossa Guerra <[email protected]> 1666800993 -0300 committer MarcoGorelli <> 1669293453 +0000 Parse `datetime` properly in `pd.to_datetime` When applying `pd.to_datetime` on array-like structure that contain a `datetime.datetime` object, while using the `format` argument, a `ValueError` is raised because the `datetime.datetime` object does not match the expected format. The implemented solution looks for `datetime.datetime` instances in the `array_strptime` method. If an instance of this type is found, it's properly handled by the new `_parse_python_datetime_object`, which returns the expected Numpy datetime object. Signed-off-by: Antonio Ossa Guerra <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally looks good couple comments
pyproject.toml
Outdated
@@ -92,6 +92,7 @@ disable = [ | |||
"unsupported-assignment-operation", | |||
"unsupported-membership-test", | |||
"unused-import", | |||
"use-a-generator", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which line was throwing this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd originally written
if any(tz is not None for tz in timezones):
as a list comprehension. It just seemed like too minor of a thing to bother annoying other contributors with, but no objections to keeping this check enabled
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like to see this enabled. We had a different package check for this previously.
The short circuiting behavior described in https://pylint.pycqa.org/en/latest/user_guide/messages/refactor/use-a-generator.html seems like a useful check to guarantee
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, have re-enabled
I might try to get this back into pyupgrade - it was taken out after some similar rewrites caused some performance degradations, but for any
I'd imagine it would be a consistent improvement asottile/pyupgrade#704
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah thanks for the context.
# note: ISO8601 formats go down a fastpath, so we need to check both | ||
# a ISO8601 format and a non-ISO8601 one | ||
ts1 = constructor(input[0]) | ||
ts2 = input[1] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this supposed to be wrapped by the constructor as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for you review! I didn't intend it to be - like this the input contains a mixture of strings and pydatetime objects
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lgtm
havent looked at the tests yet, but the non-test changes look good |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM merge when ready @jbrockmendel
ids=["non-ISO8601 format", "ISO8601 format"], | ||
) | ||
@pytest.mark.parametrize( | ||
"utc, input, expected", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe avoid using input
as argument name could help to avoid possible confusion with the input
built-in
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good one, thanks!
Thanks for keeping this updated @MarcoGorelli , should I close #49338 in favor of this PR? |
thanks @aaossa ! sure, yes please |
merging for now then to unblock the PDEP4 PR, if you have comments on the tests I'll address them separately thanks all for your reviews! |
I seem to haven gotten a lot of failed tests from this PR, especially in The error message are like Example error report: [gw2] darwin -- Python 3.9.7 /opt/homebrew/Caskroom/miniconda/base/envs/pandas/bin/python
self = <pandas.tests.tools.test_to_datetime.TestToDatetimeInferFormat object at 0x1404f9070>, cache = False
def test_to_datetime_infer_datetime_format_series_start_with_nans(self, cache):
ser = Series(
np.array(
[
np.nan,
np.nan,
"01/01/2011 00:00:00",
"01/02/2011 00:00:00",
"01/03/2011 00:00:00",
],
dtype=object,
)
)
tm.assert_series_equal(
to_datetime(ser, infer_datetime_format=False, cache=cache),
> to_datetime(ser, infer_datetime_format=True, cache=cache),
)
pandas/tests/tools/test_to_datetime.py:2405:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
pandas/core/tools/datetimes.py:1083: in to_datetime
values = convert_listlike(arg._values, format)
pandas/core/tools/datetimes.py:438: in _convert_listlike_datetimes
res = _to_datetime_with_format(
pandas/core/tools/datetimes.py:544: in _to_datetime_with_format
res = _array_strptime_with_fallback(
pandas/core/tools/datetimes.py:478: in _array_strptime_with_fallback
result, timezones = array_strptime(
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
> 'p': 18, # an additional key, only with I
E TypeError: array_strptime() got an unexpected keyword argument 'utc'
pandas/_libs/tslibs/strptime.pyx:56: TypeError
___________________________ TestToDatetimeInferFormat.test_infer_datetime_format_tz_name[UTC-0] ____________________________
[gw2] darwin -- Python 3.9.7 /opt/homebrew/Caskroom/miniconda/base/envs/pandas/bin/python
self = <pandas.tests.tools.test_to_datetime.TestToDatetimeInferFormat object at 0x1404f9220>, tz_name = 'UTC', offset = 0
@pytest.mark.parametrize(
"tz_name, offset", [("UTC", 0), ("UTC-3", 180), ("UTC+3", -180)]
)
def test_infer_datetime_format_tz_name(self, tz_name, offset):
# GH 33133
ser = Series([f"2019-02-02 08:07:13 {tz_name}"])
> result = to_datetime(ser, infer_datetime_format=True)
pandas/tests/tools/test_to_datetime.py:2414:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
pandas/core/tools/datetimes.py:1083: in to_datetime
values = convert_listlike(arg._values, format)
pandas/core/tools/datetimes.py:438: in _convert_listlike_datetimes
res = _to_datetime_with_format(
pandas/core/tools/datetimes.py:544: in _to_datetime_with_format
res = _array_strptime_with_fallback(
pandas/core/tools/datetimes.py:478: in _array_strptime_with_fallback
result, timezones = array_strptime(
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
> 'p': 18, # an additional key, only with I
E TypeError: array_strptime() got an unexpected keyword argument 'utc'
pandas/_libs/tslibs/strptime.pyx:56: TypeError My OS is MacOs 12.5.1 and AFAIKS my packages are up-to-date: INSTALLED VERSIONScommit : 2804681 pandas : 2.0.0.dev0+805.g2804681c45 |
this is fine in CI - does this function in |
ah wait, you're right @topper-123 , the PY39 macos job has started timing out I'll take a look EDIT: looks like it's passing in recent runs. So I think it might be unrelated to this PR |
Hmm I'll try out various stuff, maybe reinstall the python environment anew. Thanks for responding so quickly. |
A short follow-up: I got this fixed by compiling the c code using So this error was caused by something in my build process and is unrelated to this PR. |
@@ -495,7 +504,7 @@ def _array_strptime_with_fallback( | |||
# Indicates to the caller to fallback to objects_to_datetime64ns | |||
return None | |||
else: | |||
if "%Z" in fmt or "%z" in fmt: | |||
if any(tz is not None for tz in timezones): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is the difference here for if all the strings are tznaive but we saw a tzaware datetime object?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's right
doc/source/whatsnew/vX.X.X.rst
file if fixing a bug or adding a new feature.Granted, there's already a PR open to address this (#49338), it may be stale. As this is a blocker for PDEP0004, I've taken it forwards, but have taken care to include @aaossa as commit author.
I've fixed it up a bit to handle the objections I had and have re-written the tests