Skip to content

BUG: pd.to_datetime(infer_datetime_format=True) drops timezone #42068

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
4 changes: 4 additions & 0 deletions pandas/_libs/tslibs/parsing.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -825,6 +825,10 @@ def format_is_iso(f: str) -> bint:
iso_template = '%Y{date_sep}%m{date_sep}%d{time_sep}%H:%M:%S.%f'.format
excluded_formats = ['%Y%m%d', '%Y%m', '%Y']

if (f is not None) and (f[-2:] in ["SZ", "fZ"]):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned in my other comment, shouldn't %z be checked in the iso_template?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, I will revise my PR.

# remove last 'Z'
f = f[:-1]

for date_sep in [' ', '/', '\\', '-', '.', '']:
for time_sep in [' ', 'T']:
if (iso_template(date_sep=date_sep,
Expand Down
17 changes: 17 additions & 0 deletions pandas/tests/tools/test_to_datetime.py
Original file line number Diff line number Diff line change
Expand Up @@ -1941,6 +1941,23 @@ def test_infer_datetime_format_tz_name(self, tz_name, offset):
)
tm.assert_series_equal(result, expected)

@pytest.mark.parametrize(
"ts,zero_tz,is_utc",
[
("2019-02-02 08:07:13", "Z", True),
("2019-02-02 08:07:13", "", False),
("2019-02-02 08:07:13.012345", "Z", True),
("2019-02-02 08:07:13.012345", "", False),
],
)
def test_infer_datetime_format_zero_tz(self, ts, zero_tz, is_utc):
# GH 41047
s = Series([ts + zero_tz])
result = to_datetime(s, infer_datetime_format=True)
tz = pytz.utc if is_utc else None
expected = Series([Timestamp(ts, tz=tz)])
tm.assert_series_equal(result, expected)

@pytest.mark.parametrize("cache", [True, False])
def test_to_datetime_iso8601_noleading_0s(self, cache):
# GH 11871
Expand Down
26 changes: 26 additions & 0 deletions pandas/tests/tslibs/test_parsing.py
Original file line number Diff line number Diff line change
Expand Up @@ -226,3 +226,29 @@ def test_parse_time_string_check_instance_type_raise_exception():
result = parse_time_string("2019")
expected = (datetime(2019, 1, 1), "year")
assert result == expected


@pytest.mark.parametrize(
"fmt,expected",
[
("%Y %m %d %H:%M:%S", True),
("%Y/%m/%d %H:%M:%S", True),
(r"%Y\%m\%d %H:%M:%S", True),
("%Y-%m-%d %H:%M:%S", True),
("%Y.%m.%d %H:%M:%S", True),
("%Y%m%d %H:%M:%S", True),
("%Y-%m-%dT%H:%M:%S", True),
("%Y-%m-%dT%H:%M:%SZ", True),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't appear like a valid strftime directive. To parse Z, I think %z needs to be used e.g.

In [9]: datetime.datetime.strptime("2020-05-01T18:04:03Z", "%Y-%m-%dT%H:%M:%S%z")
Out[9]: datetime.datetime(2020, 5, 1, 18, 4, 3, tzinfo=datetime.timezone.utc)

Copy link
Contributor Author

@i-aki-y i-aki-y Jun 18, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your comment.
My fix was done to fit the output of guess_datetime_format.

ex.

In [1]: from pandas._libs.tslibs.parsing import guess_datetime_format
   ...: guess_datetime_format('2020-05-01T18:04:03Z')
Out[1]: '%Y-%m-%dT%H:%M:%SZ'

But as you said, the format string like "%SZ" should be "%S%z"
So now I guess the guess_datetime_format function also has some bugs.

("%Y-%m-%dT%H:%M:%S.%f", True),
("%Y-%m-%dT%H:%M:%S.%fZ", True),
("%Y%m%d", False),
("%Y%m", False),
("%Y", False),
("%Y-%m-%d", True),
("%Y-%m", True),
],
)
def test_is_iso_format(fmt, expected):
# see gh-41047
result = parsing.format_is_iso(fmt)
assert result == expected