-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: Index.astype does not localize to UTC first like Series.astype with datetime64[ns, tz] dtypes #33401
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
cc @jorisvandenbossche @TomAugspurger fixing this inconsistency would move us a step closer to not-special-casing DatetimeTZBlock |
Cool. Agreed with the expected output. |
I am fully on board with making this consistent; but I am not fully sure I find the output of Series more expected than the output of DatetimeIndex' astype (it's rather subjective though). The docs say that astype can "localize and convert" tz-naive timestamps, but doesn't explicitly says "localize to UTC and convert". Basically the two options (expressed with an explicit localize):
In the Now, regardless of my comment about the expected result: is this something we want to simply change? (which is a breaking change) Or can we deprecate it first? (I think users can then avoid the warning by using tz_localize explicitly) |
On second thought, the "localize and convert" in the docs probably mean "localize to UTC and convert", indeed as @mroeschke suggested. Because otherwise if would just be a "localize" of course (there is nothing to convert anymore when localizing to the target timezone ..) |
So do we actually have arguments for either behaviours? (apart from "because Series is doing it that way") |
I think the Technically Correct answer may be to raise and require users to use tz_localize The semantics of |
I recall the initial implementation was pure convenience and in retrospect was a mistake. think we should actually deprecate this entirely. Being explicit about using |
For Series/DatetimeIndex/DatetimeArray there is an easy "do this instead". what about DataFrame.astype? |
I did a bit of history search with git blame to see where the type of "localize" behaviour was introduced. For Series.astype, it was in #11003. For Index.astype, it was in #18937 (so a bit later, but introducing a different behaviour as Series.astype). This issue was actually also opened in response to #33399, where someone reported the current Series.astype behaviour as confusing (expecting it to localize to the specified tz, and not localizing to UTC and then converting to the specified tz). As another example, compare the Series constructor vs astype:
I think it could also be reasonable to expect the same result. That actually reminds me of the long discussion we had in #24559 (although that was mostly about how to interpret integers, but it still re-confirmed the interpretation of tz-naive timestamps as "wall-time" in constructors) |
@jorisvandenbossche it sounds like you lean towards making [19] match [18]? Thoughts on the just-require-tz_localize approach? I haven't had any bright ideas on how to handle the |
I would at least make them consistent. And when looking at both options, I think I personally find the output of [18] more logical than [19] Simply disallowing would solve any ambiguity, but I think the practicality of having some way to do this with astype (eg to cover the DataFrame case you mention) is for me a good enough reason to keep it. |
Related: different DTI vs Series constructor behavior, based on test_construction_consistency
|
A point in favor of the Series behavior is that it round-trips correctly |
This was deprecated in #39258 which makes this bug a nonissue. Closing |
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
(optional) I have confirmed this bug exists on the master branch of pandas.
Code Sample, a copy-pastable example
Problem description
Out[5]
is the expected output as documented in https://pandas.pydata.org/docs/user_guide/timeseries.html#time-zone-series-operations.Out[6]
should align therefore withOut[5]
Expected Output
The text was updated successfully, but these errors were encountered: