Skip to content

DateTimeIndex.tz_convert() does not apply DST from 2038 onward #33061

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
obeavers opened this issue Mar 27, 2020 · 17 comments · Fixed by #33863
Closed

DateTimeIndex.tz_convert() does not apply DST from 2038 onward #33061

obeavers opened this issue Mar 27, 2020 · 17 comments · Fixed by #33863
Assignees
Labels
Datetime Datetime data dtype Docs good first issue Timezones Timezone data dtype
Milestone

Comments

@obeavers
Copy link

Code Sample, a copy-pastable example if possible

assert np.all(pd.date_range('1/1/2037', periods=8760, freq='H', tz='EST').time == pd.date_range('1/1/2037', periods=8760, freq='H', tz='US/Eastern').time) == False

assert np.all(pd.date_range('1/1/2038', periods=8760, freq='H', tz='EST').time == pd.date_range('1/1/2038', periods=8760, freq='H', tz='US/Eastern').time) == False # fails

assert np.all(pd.date_range('1/1/2039', periods=8760, freq='H', tz='EST').time == pd.date_range('1/1/2039', periods=8760, freq='H', tz='US/Eastern').time) == False # fails

Problem description

Wow, this one hurt. US/Eastern timezone is DST-adjusted (blend of EST/EDT) whereas EST is just EST.

The second and third assert statements above should both return False.

Surprised this hasn't come up before.

This is apparently related to a UNIX issue: https://en.wikipedia.org/wiki/Year_2038_problem. With that said, it seems the dtype is datetime64 with some pandas customizations on timezone. Supposedly 64 bit should have solved this.

Expected Output

Both of the following should pass:
assert np.all(pd.date_range('1/1/2038', periods=8760, freq='H', tz='EST').time == pd.date_range('1/1/2038', periods=8760, freq='H', tz='US/Eastern').time) == False # fails

assert np.all(pd.date_range('1/1/2039', periods=8760, freq='H', tz='EST').time == pd.date_range('1/1/2039', periods=8760, freq='H', tz='US/Eastern').time) == False # fails

Output of pd.show_versions()

[paste the output of pd.show_versions() here below this line]
NSTALLED VERSIONS

commit : None
python : 3.8.1.final.0
python-bits : 64
OS : Windows
OS-release : 10
machine : AMD64
processor : Intel64 Family 6 Model 142 Stepping 10, GenuineIntel
byteorder : little
LC_ALL : None
LANG : None
LOCALE : English_United States.1252

pandas : 1.0.0
numpy : 1.18.1
pytz : 2019.3
dateutil : 2.8.1
pip : 20.0.2
setuptools : 45.1.0.post20200127
Cython : None
pytest : 5.3.5
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 2.11.1
IPython : 7.11.1
pandas_datareader: None
bs4 : None
bottleneck : None
fastparquet : None
gcsfs : None
lxml.etree : None
matplotlib : None
numexpr : 2.7.1
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : 0.13.0
pytables : None
pytest : 5.3.5
pyxlsb : None
s3fs : None
scipy : None
sqlalchemy : None
tables : 3.6.1
tabulate : None
xarray : None
xlrd : 1.2.0
xlwt : None
xlsxwriter : None
numba : None

@TomAugspurger
Copy link
Contributor

cc @mroeschke.

@obeavers is the issue likely in pandas or pytz?

@TomAugspurger TomAugspurger added Datetime Datetime data dtype Timezones Timezone data dtype labels Mar 27, 2020
@obeavers
Copy link
Author

Looks like its probably a pytz issue: stub42/pytz#31.

@jbrockmendel
Copy link
Member

IIRC from discussions from @pganssle this is basically the 2038 problem. The system tz files only specify DST transitions up to the end of the epoch.

@pganssle
Copy link
Contributor

IIRC from discussions from @pganssle this is basically the 2038 problem. The system tz files only specify DST transitions up to the end of the epoch.

It's not actually that the system tz files only specify DST transitions to the end of the epoch, it's just that pytz and dateutil process only the Version 1 portion of the files, where the transitions are specified as 32-bit integer offsets from the Unix epoch. The system data almost always actually has the data you need, and the reference implementation for PEP 615 supports Version 2+ files, which do work after 2038.

IIUC pytz will not fix this, but it might get fixed automatically if pytz is refactored into a wrapper around PEP 615 time zones. I'll likely attempt to integrate some of the PEP 615 upgrades to dateutil once it's accepted.

Shameless plug for PEP 615: discussion is ongoing!

@mroeschke
Copy link
Member

Is it worth raising a warning in cases where user are attempting timezone conversions past 2038?

@obeavers
Copy link
Author

obeavers commented Mar 27, 2020 via email

@pganssle
Copy link
Contributor

Is it worth raising a warning in cases where user are attempting timezone conversions past 2038?

Do you mean in pytz / dateutil or in pandas? I think in either case the answer is still probably "no", but the case for doing it in pandas is weaker than the case for doing it in one of the upstream libraries, since it's a bug in those libraries and (mostly) not a bug in pandas (meaning that you'd be spewing erroneous warnings if they fixed it in those libraries).

The reason I say it's probably still "no" even upstream is that warning on dates after 2038 is a somewhat arbitrary cut-off, and it's not entirely clear what you are warning about anyway. Issuing a warning for 2038 and not 2037 implies that the data we have about time zones in 2037 is good, but the data in 2038 is bad. In reality, the further into the future you go, the less likely the time zone data is to be accurate for any given zone. It's true that the Version 2 data has better guesses than the Version 1 data after 2038, but I don't put a huge amount of stock into those guesses in the first place.

I'll note that this is one of the places that you're likely to start getting bitten sooner rather than later by the way pandas relies on the internal implementation details of its time zone libraries, since pandas is doing things like doing its own "find which transition applies at a given datetime" calculation by doing a binary search of the transition points - meaning that pandas may have a much more extreme version of this bug for Version 2 files that do not include explicit transitions after a certain point, because not all version 2 files have explicit transitions up to 2038 - some of them have their last transition far in the past, with a TZ string specifying the transitions indefinitely after that point. I'm not sure if I know of any that have "last transition in the past, TZ str applying daylight saving time on a regular schedule in an ongoing fashion applied to the present", but I have not thoroughly explored the data. I recommend moving over to using the public API as soon as possible.

(I'll note that the PEP 615 time zones are very fast -- considerably faster than pytz -- and if you start using the standard tzinfo interface today, you'll be compatible with them immediately when the backport arrives, which I expect to be in ~1-2 months).

@mroeschke
Copy link
Member

Might be worth documenting in the user guide rather than provide a runtime warning: https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#time-zone-handling

@obeavers
Copy link
Author

obeavers commented Mar 28, 2020 via email

@TomAugspurger
Copy link
Contributor

+1 to documenting this in our user guide.

I don't think we should be warning. The underlying tz library should be able to detect and warn for that better than we can.

@narendra20
Copy link

can I take a look at this issue?

@TomAugspurger
Copy link
Contributor

@narendra20 sure. I think the agreement from the maintainers is that a note in the user guide is the best way forward.

@telferm57
Copy link
Contributor

@narendra20 are you moving forward with this ? If not, I can ! (seems like a doc fix ifs a good first issue)

@telferm57
Copy link
Contributor

take

@telferm57
Copy link
Contributor

telferm57 commented Apr 28, 2020

Proposed wording:

Warning:

If you are using dates beyond 13 Jan 2038, note that pandas does not apply daylight saving time adjustments to timezone aware dates. This is partly because the underlying libraries do not currently address the Year 2038 Problem , and partly because there is some discussion on how reliable any DST settings that far into the future will be.

For example, for two dates that are in British Summer Time and so would normally be GMT+1, both the following evaluate as true:

assert pd.Timestamp('2037-03-31T010101', tz='Europe/London') != pd.Timestamp('2037-03-31T010101', tz='GMT')
assert pd.Timestamp('2038-03-31T010101', tz='Europe/London') == pd.Timestamp('2038-03-31T010101', tz='GMT')

@telferm57
Copy link
Contributor

thanks - Is the above wording OK, do you think ?

@pganssle
Copy link
Contributor

I disagree with that wording, it implies that pandas actively prevents time zone changes after the epochalypse, when in fact this has nothing to do with pandas. Some time zone providers (notably the only ones pandas supports) currently do not support the full TZif specification and as a result time zone offsets may stop changing after the epochalypse.

I will comment on the PR.

@jreback jreback added this to the 1.1 milestone Apr 30, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Datetime Datetime data dtype Docs good first issue Timezones Timezone data dtype
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants