Skip to content

DOC: timezone warning for DST beyond 2038-01-18 #33863

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
May 10, 2020
Merged

DOC: timezone warning for DST beyond 2038-01-18 #33863

merged 5 commits into from
May 10, 2020

Conversation

telferm57
Copy link
Contributor

Add warning to time_series user guide that after 2038-01-18 DST will not be respected in tz aware dates

@telferm57 telferm57 changed the title DOC: timezone warning for DST beyond 2038-01-19 DOC: timezone warning for DST beyond 2038-01-18 Apr 29, 2020
@pganssle
Copy link
Contributor

If you are using dates beyond 18 Jan 2038, note that pandas does not apply daylight saving time adjustments to timezone aware dates. This is partly because the underlying libraries do not currently address the Year 2038 Problem, and partly because there is some discussion on how reliable any DST settings that far into the future will be.

As mentioned in the issue, I do not think this wording is accurate. pandas only ever uses the UTC offsets provided by the libraries. Right now the libraries that pandas supports do not have any transitions after 2038-01-18.

I think if you're going to warn against this, something more circumspect would be a good idea, like:

Time zone data for far future time zones are likely to be inaccurate, as they are simple extrapolations of the current set of rules. In general, it is inadvisable to rely on the accuracy of time zone data for far future datetimes. This is especially true for datetimes after 18 January 2038, since some time zone libraries do not support transitions after that point due to the Year 2038 Problem.

I'm vague about the "some time zone libraries" since I know that dateutil is planning to add support soon-ish and the maintainer of pytz has said that he's intending to turn pytz into a thin wrapper around PEP 615 when it is available (and the PEP 615 zoneinfo module does support these), so it is likely that any strong wording here will become inaccurate soon.

@telferm57
Copy link
Contributor Author

Thanks for the response. I think we need to get the 2038 issue in there first, because that is the symptom users will see. Also, I didn't want to go into any details of the cause of version 1 libraries being used to store transition dates in signed 32-bit fields, as I wanted to keep it quite short - there are already 3 other warnings in that section of the guide! I don't think people are that interested in where the cause of the issue is, as long as it will be fixed well before the epochalypse! (love that term! probably reveals my tz naivety that I haven't heard it before) .

Proposed new wording:

If you are using dates beyond 2038-01-18, due to current deficiencies in the underlying libraries caused by the year 2038 problem, daylight saving time (DST) adjustments to timezone aware dates will not be applied. The underlying libraries will be fixed at some point, which will also correct this behaviour. It should be noted though, that time zone data for far future time zones are likely to be inaccurate, as they are simple extrapolations of the current set of (regularly revised) rules.

@mroeschke
Copy link
Member

The underlying libraries will be fixed at some point.

I think this guarantee is too strong. I think the phrasing should just imply that the fix is on the timezone library side.

@telferm57
Copy link
Contributor Author

If and when the underlying libraries are fixed, the DST transitions will be applied?

@jreback jreback added Docs Timezones Timezone data dtype labels Apr 30, 2020
@mroeschke
Copy link
Member

Sure. If you could push up the revised changes, it will be easier to review inline from the changes than here in the comments.

@@ -2265,6 +2265,20 @@ you can use the ``tz_convert`` method.
Instead, the datetime needs to be localized using the ``localize`` method
on the ``pytz`` time zone object.

.. warning::

If you are using dates beyond 18 Jan 2038, note that pandas does not apply daylight saving time adjustments to timezone aware dates. This is partly because the underlying libraries do not currently address the Year 2038 Problem, and partly because there is some discussion on how reliable any DST settings that far into the future will be.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The second part and partly because there is some discussion on how reliable any DST settings that far into the future will be sounds vague. There are not just discussions. It seems arbitrary to start raising this concern in 2038. With so many jurisdictions in the world, some will change their time zones earlier. For instance, the EU is on track to abolish DST in the 2020s. I think this sentence should be left out because it's a separate issue.

Or that half-sentence could be put in a separate warning but I think that would be overkill since this issue is not Pandas-specific. Our docs will be very cluttered if we warn about every generic issue that might occur with date and time handling.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC it is only DST transitions that are affected , and only after 2038. So any changes up to 2038 will be reflected correctly in construction of a tz aware time ... have I understood you OK? The exact sentence you quoted has been removed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, changes will not be reflected correctly until 2038 because you forgot to import crystalball (and even if you had imported it you'd find it can't be used to read the future). However, I think the warning should only address the 2038 issue. One could make an extra warning for the general issue of real-world timezones being unpredictable in the future but I don't think that's necessary because that's what people expect.

The new sentence It should be noted though, that time zone data for far future time zones are likely to be inaccurate, as they are simple extrapolations of the current set of (regularly revised) rules. is misleading and confusing. One can't even predict timezone switches a day in advance, as we have seen in 2018 in Morocco, let alone 18 years.

Also, the example you give is only about the 2038 problem. Especially in Britain it's likely that they abolish daylight saving together with the EU before 2038. The quoted sentence just makes it a little bit harder for people to follow that example.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, it is quite possible that UK will stick to permanent DST some time in the next 5 years, but if that happens, the underlying libraries as they are will support that - the changes will be made in pre-2038 dates. I can't see what the problem is!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, it is quite possible that UK will stick to permanent DST some time in the next 5 years, but if that happens, the underlying libraries as they are will support that - the changes will be made in pre-2038 dates.

The same argument can be made for post-2038 dates. Certainly the underlying libraries will be fixed before 2038, so the only time this would ever come up is if you're trying to convert a local time in the far future into a timestamp (either in UTC or find out what offset applies), which is basically something that cannot be known to be accurate.

This is my problem with the whole idea of adding this warning ­– it's saying, "If you are trying to do this thing you shouldn't do, the answer might be different from what you expect." It may be that the zone you're in has eliminated DST by then, in which case the answer is right and the "correct" rule is the one that's wrong. It's not a problem particular to pandas, and it's not easy to convey to end users what the problem is and what should be done about it, so a warning in the documentation doesn't seem like a good fit.

I still don't like the focus on 2038 as the cut-off point, because it makes it seem like adding version 2 support to the underlying libraries will fix the problem, but in reality the problem is that the users are trying to do this at all, and they just notice something out of the ordinary after 2038.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pganssle you're essentially saying that people shouldn't use timezones on any far future dates that they deal with. (side note: I disagree with the word far, the future is unpredictable, far or near.) This is unrealistic. While many applications could rely on UT, UTC, or system time, I'm sure you can come up with many use cases where developers need to deal with time zones in future dates (e.g. when opening hours, TV schedules or flight schedules are involved).

Obviously, nobody can guarantee that future time zones will be correct. However, developers expect a predictable behaviour of the library. Now, the 2038 issue breaks that predictability. It's common sense that the future cannot be predicted, so there is no need to warn about potential future political changes. However, developers should be warned if code yields unexpected results.

That's why we should warn developers about the 2038 issue (as in the PR title). And the following sentence should be removed because it has nothing to do with the 2038 issue: It should be noted though, that time zone data for far future time zones are likely to be inaccurate, as they are simple extrapolations of the current set of (regularly revised) rules.

@pganssle if you think we need to warn developers about using timezones when dealing with future dates, this should go in a separate pull request.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pganssle you're essentially saying that people shouldn't use timezones on any far future dates that they deal with. (side note: I disagree with the word far, the future is unpredictable, far or near.) This is unrealistic. While many applications could rely on UT, UTC, or system time, I'm sure you can come up with many use cases where developers need to deal with time zones in future dates (e.g. when opening hours, TV schedules or flight schedules are involved).

No, I'm saying that you should only use "time zones" in future dates and that conversion to UTC is increasingly unreliable the further into the future you go. 18 years is a long way into the future, so this is like saying, "Don't forget to bring a bathing suit if you jump into shark-infested water!"

It's somewhat misleading to include a warning like this without explaining that yes it's not what you might expect but it's basically not a problem at the moment, because if you are relying on accuracy in this situation you have bigger problems. It also implies that dates in December 2037 can be used somewhat accurately.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay. I'm convinced now. We should warn about converting (any) future time. I still think this warning shouldn't be an afterthought in the 2038 warning text. It should be its own warning box. That gives it the prominence it deserves and ensures the warning remains if the 2038 problem is resolved in the downstream libraries and that box is removed.

Something along the lines of:

Be aware that for times in the future, correct conversion between time zones (and UTC) cannot be guaranteed by any time zone library. Sometimes the rules governing a timezone's offset from UTC are changed. Authorities usually announce such changes many months in advance but there have been examples of much shorter lead times such as when Morocco announced just two days before the planned switch from summer time to winter time in 2018 that the country would stay on summer time permanently. Furthermore, the databases that Pandas relies on may need some time to record planned changes to timezone offsets.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made a PR to add this text: #34100

@mroeschke mroeschke added this to the 1.1 milestone May 1, 2020

d_2037 = '2037-03-31T010101'
d_2038 = '2038-03-31T010101'
DST = 'Europe/London'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this called DST instead of LON or some other thing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because I am focusing on DST transitions - just happen to have picked London as that's local

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

People will think it means "this is the Daylight Saving Time zone". You don't have to call it LON, but you should at least call it ZONE or something.

d_2037 = '2037-03-31T010101'
d_2038 = '2038-03-31T010101'
DST = 'Europe/London'
assert pd.Timestamp(d_2037, tz=DST) != pd.Timestamp(d_2037, tz='GMT')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not seem right to me.

If you're going to keep the example (and I'm not convinced you should, especially since it's right around when a DST transition happens - if anything you should move it deep into summer to guarantee that the fluctuations aren't just due to the transition moving around), it would be better to make assertions about the thing you care about.

assert pd.Timestamp(d_2037, tz=LON).tzname() != "GMT"
assert pd.Timestamp(d_2038, tz=LON).tzname() != "GMT"

Even better, though, would be a repr:

>>> pd.Timestamp(d_2037, tz=LON).tzname()
'BST'
>>> pd.Timestamp(d_2037, tz=LON).tzname()
'GMT'

Copy link
Contributor Author

@telferm57 telferm57 May 5, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree summer date would be clearer, but not so sure whether your examples are clearer, or whether using BST (local zone name for DST) is clearer for a global audience ... hmmm

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't really matter what offsets you use there, pick anything. The important thing is that it's obvious that they are different on the same date in different years, and that it's obvious that that's not due to fluctuations one way or the other in the date of the DST change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it OK to submit further changes once the PR has been approved?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes you can submit further changes. I think @pganssle's suggestion about using the repr would be clearer.

@jreback jreback merged commit a5db643 into pandas-dev:master May 10, 2020
@jreback
Copy link
Contributor

jreback commented May 10, 2020

thanks @telferm57

happy to have clarifications in another PR. but this looks good for now.

MarcoGorelli added a commit that referenced this pull request Aug 21, 2020
* DOC: timezone warning for dates beyond TODAY

introducing a suggestion discussed in PR #33863 :
Added a warning in the user guide that timezone conversion on future dates is inherently unreliable.

* shorter warning text

Co-authored-by: Marco Gorelli <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Docs Timezones Timezone data dtype
Projects
None yet
Development

Successfully merging this pull request may close these issues.

DateTimeIndex.tz_convert() does not apply DST from 2038 onward
5 participants