Skip to content

BUG: Odd timezone offset change with old datetimes with tz_convert #41834

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
2 of 3 tasks
rok opened this issue Jun 5, 2021 · 5 comments
Closed
2 of 3 tasks

BUG: Odd timezone offset change with old datetimes with tz_convert #41834

rok opened this issue Jun 5, 2021 · 5 comments
Labels
Bug Timezones Timezone data dtype Upstream issue Issue related to pandas dependency

Comments

@rok
Copy link
Contributor

rok commented Jun 5, 2021

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • (optional) I have confirmed this bug exists on the master branch of pandas.


Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.

Code Sample, a copy-pastable example

import pandas as pd
[
    pd.to_datetime(["1900-01-01"]).tz_localize("UTC").tz_convert("Australia/Broken_Hill"),
    pd.to_datetime(["1900-01-01"]).tz_localize("UTC").tz_convert("Pacific/Marquesas"),
    pd.to_datetime(["2000-01-01"]).tz_localize("UTC").tz_convert("Pacific/Marquesas"),
    pd.to_datetime(["2000-01-01"]).tz_localize("UTC").tz_convert("Australia/Broken_Hill"),
    pd.to_datetime(["1900-01-01"]).tz_localize("UTC").tz_convert("Etc/GMT-9"),
    pd.to_datetime(["2000-01-01"]).tz_localize("UTC").tz_convert("Etc/GMT-9"),
]

Returns:

[DatetimeIndex(['1900-01-01 09:26:00+09:26'], dtype='datetime64[ns, Australia/Broken_Hill]', freq=None),
 DatetimeIndex(['1899-12-31 14:42:00-09:18'], dtype='datetime64[ns, Pacific/Marquesas]', freq=None),
 DatetimeIndex(['1999-12-31 14:30:00-09:30'], dtype='datetime64[ns, Pacific/Marquesas]', freq=None),
 DatetimeIndex(['2000-01-01 10:30:00+10:30'], dtype='datetime64[ns, Australia/Broken_Hill]', freq=None),
 DatetimeIndex(['1900-01-01 09:00:00+09:00'], dtype='datetime64[ns, Etc/GMT-9]', freq=None),
 DatetimeIndex(['2000-01-01 09:00:00+09:00'], dtype='datetime64[ns, Etc/GMT-9]', freq=None)]

Problem description

Timezone offset seems to change when it shouldn't? I'm not 100% this is not the correct behavior but it seems odd.
I hope this is known behavior and not something esoteric.

Expected Output

Probably this:

[DatetimeIndex(['1900-01-01 10:30:00+10:30'], dtype='datetime64[ns, Australia/Broken_Hill]', freq=None),
 DatetimeIndex(['1899-12-31 14:30:00-09:30'], dtype='datetime64[ns, Pacific/Marquesas]', freq=None),
 DatetimeIndex(['1999-12-31 14:30:00-09:30'], dtype='datetime64[ns, Pacific/Marquesas]', freq=None),
 DatetimeIndex(['2000-01-01 10:30:00+10:30'], dtype='datetime64[ns, Australia/Broken_Hill]', freq=None),
 DatetimeIndex(['1900-01-01 09:00:00+09:00'], dtype='datetime64[ns, Etc/GMT-9]', freq=None),
 DatetimeIndex(['2000-01-01 09:00:00+09:00'], dtype='datetime64[ns, Etc/GMT-9]', freq=None)]

I'm not sure these timezones even existed then so this might be an invalid calculation.

Output of pd.show_versions()

INSTALLED VERSIONS

commit : 2cb9652
python : 3.7.0.final.0
python-bits : 64
OS : Linux
OS-release : 5.4.0-73-lowlatency
Version : #82-Ubuntu SMP PREEMPT Wed Apr 14 19:19:50 UTC 2021
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8

pandas : 1.2.4
numpy : 1.19.1
pytz : 2020.5
dateutil : 2.8.1
pip : 20.2.3
setuptools : 50.3.0.post20201006
Cython : None
pytest : 6.2.1
hypothesis : None
sphinx : 2.4.4
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 4.6.2
html5lib : None
pymysql : None
psycopg2 : 2.8.6 (dt dec pq3 ext lo64)
jinja2 : 2.11.2
IPython : 7.18.1
pandas_datareader: None
bs4 : None
bottleneck : None
fsspec : None
fastparquet : None
gcsfs : None
matplotlib : 3.3.1
numexpr : None
odfpy : None
openpyxl : 3.0.7
pandas_gbq : None
pyarrow : 1.0.1
pyxlsb : None
s3fs : None
scipy : 1.5.2
sqlalchemy : 1.3.19
tables : None
tabulate : 0.8.7
xarray : None
xlrd : 2.0.1
xlwt : None
numba : None

@jorisvandenbossche
Copy link
Member

I think this is due to a shortcoming / bug in pytz. For example for the first case of "Australia/Broken_Hill":

>>> import datetime
>>> import pytz
>>> tz1 = pytz.timezone("Australia/Broken_Hill")
>>> tz1
<DstTzInfo 'Australia/Broken_Hill' LMT+9:26:00 STD>
>>> print(tz1.utcoffset(datetime.datetime(2000, 1, 1)))
10:30:00
>>> print(tz1.utcoffset(datetime.datetime(1900, 1, 1)))
9:26:00

So for the older date, it is falling back to the LMT ("Local Mean Time").

While comparing that to the zoneinfo package in Python 3.9:

>>> import zoneinfo
>>> tz2 = zoneinfo.ZoneInfo("Australia/Broken_Hill")
>>> tz2
zoneinfo.ZoneInfo(key='Australia/Broken_Hill')
>>> print(tz2.utcoffset(datetime.datetime(2000, 1, 1)))
10:30:00
>>> print(tz2.utcoffset(datetime.datetime(1900, 1, 1)))
9:30:00

this gives the correct offset for the older datetime.

@jorisvandenbossche jorisvandenbossche added Timezones Timezone data dtype and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Jun 15, 2021
@rok
Copy link
Contributor Author

rok commented Jun 15, 2021

Huh, interesting!
I did notice this problem only appeared somewhere in the early 1900s.

@jorisvandenbossche
Copy link
Member

I don't know exactly from which point in time there are clear timezone rules, and what the different libraries return for a datetime before the start of those rules (probably the "local mean time").

For example, for an even older date, both zoneinfo and pytz return the LMT (I assume):

>>> print(tz1.utcoffset(datetime.datetime(1800, 1, 1)))
9:26:00
>>> print(tz2.utcoffset(datetime.datetime(1800, 1, 1)))
9:25:48

(pytz just rounds to the minute)

@rok
Copy link
Contributor Author

rok commented Jun 15, 2021

Yeah it gets really fuzzy really fast.
To be fair precise sub day timestamps pre-1900 would be rare. However it would be IMO important to be able to express them.

@mroeschke
Copy link
Member

Thanks for the report, but as mentioned I think this is ultimately a pytz issue (we also have a note in our timezone docs that tz libraries may have different timezone definitions). Since this an upstream issue, I'm unsure if there's a pandas specific fix here, but happy to reopen if there is.

@mroeschke mroeschke added the Upstream issue Issue related to pandas dependency label Aug 21, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Timezones Timezone data dtype Upstream issue Issue related to pandas dependency
Projects
None yet
Development

No branches or pull requests

3 participants