-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
DOC: timezone warning for DST beyond 2038-01-18 #33863
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
As mentioned in the issue, I do not think this wording is accurate. I think if you're going to warn against this, something more circumspect would be a good idea, like:
I'm vague about the "some time zone libraries" since I know that |
Thanks for the response. I think we need to get the 2038 issue in there first, because that is the symptom users will see. Also, I didn't want to go into any details of the cause of version 1 libraries being used to store transition dates in signed 32-bit fields, as I wanted to keep it quite short - there are already 3 other warnings in that section of the guide! I don't think people are that interested in where the cause of the issue is, as long as it will be fixed well before the epochalypse! (love that term! probably reveals my tz naivety that I haven't heard it before) . Proposed new wording: If you are using dates beyond 2038-01-18, due to current deficiencies in the underlying libraries caused by the year 2038 problem, daylight saving time (DST) adjustments to timezone aware dates will not be applied. The underlying libraries will be fixed at some point, which will also correct this behaviour. It should be noted though, that time zone data for far future time zones are likely to be inaccurate, as they are simple extrapolations of the current set of (regularly revised) rules. |
I think this guarantee is too strong. I think the phrasing should just imply that the fix is on the timezone library side. |
If and when the underlying libraries are fixed, the DST transitions will be applied? |
Sure. If you could push up the revised changes, it will be easier to review inline from the changes than here in the comments. |
doc/source/user_guide/timeseries.rst
Outdated
@@ -2265,6 +2265,20 @@ you can use the ``tz_convert`` method. | |||
Instead, the datetime needs to be localized using the ``localize`` method | |||
on the ``pytz`` time zone object. | |||
|
|||
.. warning:: | |||
|
|||
If you are using dates beyond 18 Jan 2038, note that pandas does not apply daylight saving time adjustments to timezone aware dates. This is partly because the underlying libraries do not currently address the Year 2038 Problem, and partly because there is some discussion on how reliable any DST settings that far into the future will be. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The second part and partly because there is some discussion on how reliable any DST settings that far into the future will be
sounds vague. There are not just discussions. It seems arbitrary to start raising this concern in 2038. With so many jurisdictions in the world, some will change their time zones earlier. For instance, the EU is on track to abolish DST in the 2020s. I think this sentence should be left out because it's a separate issue.
Or that half-sentence could be put in a separate warning but I think that would be overkill since this issue is not Pandas-specific. Our docs will be very cluttered if we warn about every generic issue that might occur with date and time handling.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC it is only DST transitions that are affected , and only after 2038. So any changes up to 2038 will be reflected correctly in construction of a tz aware time ... have I understood you OK? The exact sentence you quoted has been removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, changes will not be reflected correctly until 2038 because you forgot to import crystalball
(and even if you had imported it you'd find it can't be used to read the future). However, I think the warning should only address the 2038 issue. One could make an extra warning for the general issue of real-world timezones being unpredictable in the future but I don't think that's necessary because that's what people expect.
The new sentence It should be noted though, that time zone data for far future time zones are likely to be inaccurate, as they are simple extrapolations of the current set of (regularly revised) rules.
is misleading and confusing. One can't even predict timezone switches a day in advance, as we have seen in 2018 in Morocco, let alone 18 years.
Also, the example you give is only about the 2038 problem. Especially in Britain it's likely that they abolish daylight saving together with the EU before 2038. The quoted sentence just makes it a little bit harder for people to follow that example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, it is quite possible that UK will stick to permanent DST some time in the next 5 years, but if that happens, the underlying libraries as they are will support that - the changes will be made in pre-2038 dates. I can't see what the problem is!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, it is quite possible that UK will stick to permanent DST some time in the next 5 years, but if that happens, the underlying libraries as they are will support that - the changes will be made in pre-2038 dates.
The same argument can be made for post-2038 dates. Certainly the underlying libraries will be fixed before 2038, so the only time this would ever come up is if you're trying to convert a local time in the far future into a timestamp (either in UTC or find out what offset applies), which is basically something that cannot be known to be accurate.
This is my problem with the whole idea of adding this warning – it's saying, "If you are trying to do this thing you shouldn't do, the answer might be different from what you expect." It may be that the zone you're in has eliminated DST by then, in which case the answer is right and the "correct" rule is the one that's wrong. It's not a problem particular to pandas
, and it's not easy to convey to end users what the problem is and what should be done about it, so a warning in the documentation doesn't seem like a good fit.
I still don't like the focus on 2038 as the cut-off point, because it makes it seem like adding version 2 support to the underlying libraries will fix the problem, but in reality the problem is that the users are trying to do this at all, and they just notice something out of the ordinary after 2038.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pganssle you're essentially saying that people shouldn't use timezones on any far future dates that they deal with. (side note: I disagree with the word far, the future is unpredictable, far or near.) This is unrealistic. While many applications could rely on UT, UTC, or system time, I'm sure you can come up with many use cases where developers need to deal with time zones in future dates (e.g. when opening hours, TV schedules or flight schedules are involved).
Obviously, nobody can guarantee that future time zones will be correct. However, developers expect a predictable behaviour of the library. Now, the 2038 issue breaks that predictability. It's common sense that the future cannot be predicted, so there is no need to warn about potential future political changes. However, developers should be warned if code yields unexpected results.
That's why we should warn developers about the 2038 issue (as in the PR title). And the following sentence should be removed because it has nothing to do with the 2038 issue: It should be noted though, that time zone data for far future time zones are likely to be inaccurate, as they are simple extrapolations of the current set of (regularly revised) rules.
@pganssle if you think we need to warn developers about using timezones when dealing with future dates, this should go in a separate pull request.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pganssle you're essentially saying that people shouldn't use timezones on any far future dates that they deal with. (side note: I disagree with the word far, the future is unpredictable, far or near.) This is unrealistic. While many applications could rely on UT, UTC, or system time, I'm sure you can come up with many use cases where developers need to deal with time zones in future dates (e.g. when opening hours, TV schedules or flight schedules are involved).
No, I'm saying that you should only use "time zones" in future dates and that conversion to UTC is increasingly unreliable the further into the future you go. 18 years is a long way into the future, so this is like saying, "Don't forget to bring a bathing suit if you jump into shark-infested water!"
It's somewhat misleading to include a warning like this without explaining that yes it's not what you might expect but it's basically not a problem at the moment, because if you are relying on accuracy in this situation you have bigger problems. It also implies that dates in December 2037 can be used somewhat accurately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay. I'm convinced now. We should warn about converting (any) future time. I still think this warning shouldn't be an afterthought in the 2038 warning text. It should be its own warning box. That gives it the prominence it deserves and ensures the warning remains if the 2038 problem is resolved in the downstream libraries and that box is removed.
Something along the lines of:
Be aware that for times in the future, correct conversion between time zones (and UTC) cannot be guaranteed by any time zone library. Sometimes the rules governing a timezone's offset from UTC are changed. Authorities usually announce such changes many months in advance but there have been examples of much shorter lead times such as when Morocco announced just two days before the planned switch from summer time to winter time in 2018 that the country would stay on summer time permanently. Furthermore, the databases that Pandas relies on may need some time to record planned changes to timezone offsets.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made a PR to add this text: #34100
|
||
d_2037 = '2037-03-31T010101' | ||
d_2038 = '2038-03-31T010101' | ||
DST = 'Europe/London' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this called DST
instead of LON
or some other thing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
because I am focusing on DST transitions - just happen to have picked London as that's local
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
People will think it means "this is the Daylight Saving Time zone". You don't have to call it LON
, but you should at least call it ZONE
or something.
d_2037 = '2037-03-31T010101' | ||
d_2038 = '2038-03-31T010101' | ||
DST = 'Europe/London' | ||
assert pd.Timestamp(d_2037, tz=DST) != pd.Timestamp(d_2037, tz='GMT') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This does not seem right to me.
If you're going to keep the example (and I'm not convinced you should, especially since it's right around when a DST transition happens - if anything you should move it deep into summer to guarantee that the fluctuations aren't just due to the transition moving around), it would be better to make assertions about the thing you care about.
assert pd.Timestamp(d_2037, tz=LON).tzname() != "GMT"
assert pd.Timestamp(d_2038, tz=LON).tzname() != "GMT"
Even better, though, would be a repr:
>>> pd.Timestamp(d_2037, tz=LON).tzname()
'BST'
>>> pd.Timestamp(d_2037, tz=LON).tzname()
'GMT'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree summer date would be clearer, but not so sure whether your examples are clearer, or whether using BST (local zone name for DST) is clearer for a global audience ... hmmm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't really matter what offsets you use there, pick anything. The important thing is that it's obvious that they are different on the same date in different years, and that it's obvious that that's not due to fluctuations one way or the other in the date of the DST change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it OK to submit further changes once the PR has been approved?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes you can submit further changes. I think @pganssle's suggestion about using the repr would be clearer.
thanks @telferm57 happy to have clarifications in another PR. but this looks good for now. |
* DOC: timezone warning for dates beyond TODAY introducing a suggestion discussed in PR #33863 : Added a warning in the user guide that timezone conversion on future dates is inherently unreliable. * shorter warning text Co-authored-by: Marco Gorelli <[email protected]>
Add warning to time_series user guide that after 2038-01-18 DST will not be respected in tz aware dates
black pandas
git diff upstream/master -u -- "*.py" | flake8 --diff