-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
REGR: Index.union should pick correct dtype for combinations of tstamps with different tzones #28034
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I think that was a deliberate change: https://pandas.pydata.org/pandas-docs/stable/whatsnew/v0.25.0.html#incompatible-index-type-unions. cc @ArtinSarraf if you recall whether mix of datetimetz was discussed. |
I've seen the "incompatible types" discussion but I feel like this is going too far. Admittedly this is an edge case (like in the int vs float discussion). |
The ints and float case is a reasonable analogy. But with ints & floats
it's not clear (to me) that the loss
of precision for integers is the best outcome. You could make a similar
argument for datetimetz and
daylight savings issues, when combining multiple timezones.
…On Tue, Aug 20, 2019 at 11:16 AM Aurélien Campéas ***@***.***> wrote:
I've seen the "incompatible types" discussion but I feel like this is
going too far. Admittedly this is an edge case.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#28034?email_source=notifications&email_token=AAKAOIVRMBSA32VFWVXVQW3QFQKFLA5CNFSM4INWXMPKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD4W23KQ#issuecomment-523087274>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAKAOIW3KR4SSYMPPVWTZYTQFQKFLANCNFSM4INWXMPA>
.
|
I agree that this is a deliberate change in the right direction as to prevent the loss of original timezone information akin to this change back in 0.24.0: https://pandas.pydata.org/pandas-docs/stable/whatsnew/v0.24.0.html#parsing-datetime-strings-with-timezone-offsets You can always |
Timezone information, as far as I am concerned, is a presentation information, a bit like the 'freq' attribute. Two timestamps in differing timezones are fundamentally compatible, and losing timezones is about dropping display information. Are there situations where this would be wrong ? Another point: it already cost me an hour to write code to paper over the issue. I have a non-indecent test base, but real world tests reveal that I have non-handled edge cases. So it's not just a matter of .tz_converting the offending index and the cost of reimplementing myself what .union did previously is going to mount. In other words, this is a breaking change. |
this is not the case at all in pandas; time zones are first class; turning them to UTC is losing information; we have deliberately tried not to do this generally mixing times zones while possible is often a bug in user code; it’s possible but very likely not something you want library code to do, rather it becomes a burden of the user code
not really sure what is your point here. pandas has lots of users; api changes are a fact of life |
@TomAugspurger - yes, this exact case was discussed explicitly.
|
Thanks for the reference. Maybe then a small deprecation period would have been gentler ? @ArtinSarraf can you expand on your dst transition ambiguity example ? For my personal education I'd like to understand the issue (if only because I currently believe tz-aware stamps don't have ambiguity issues). |
Ignore my comment, what I was thinking of wasn't an issue with already tz-aware date ranges, but rather the ambiguity than can occur from localizing a non tz-aware date range. |
The very fact tz-aware tstamps are non-abiguous is the reason I think this is a regression. The actual timezones are display information (just like |
@zogzog pls read my comments again. there is loss of information as pandas keeeps the tz meta data as part of the dtype; so by definition you are losing information. |
I've read your comment. "By definition" doesn't cut it. Did you read mine and try to understand the argument and address it ? |
@zogzog this is what @jreback is referring to:
|
@mroeschke thanks, but I had perfectly well understood. I stand by my position that this is a benign loss, in fact a complete non-issue. Apart from the principled position you take (which of course I'm not opposed to in abstract), I would love to know where in practice this would cause issues. Did the previous design cause real-world issues ? Do you have real-world data, like bug reports, indicating that it is wrong to switch the time zone ? |
I imagine any piece of logic that is trying to detect the original timezone and suddenly getting UTC would be impacted by automatic coercion to UTC. As mentioned above, this was a deliberate API change and this is a move toward maintaining original data types. Going to close this issue as there's not much appetite to go the other direction |
see pandas-dev/pandas#28034 We switch every timezone-aware series into utc now.
Here's a pdb session showing the issue:
With pandas 0.24 I got back an index with a
dtype='datetime64[ns, UTC]'
This looks a bit like a cousin of #26778
The text was updated successfully, but these errors were encountered: