-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
ENH: Allow storing timezone-aware datetimes in a series with a datetime64 dtype #46998
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
this would require a dedicated extension type - so it's possible certainly a well tested community provided PR would be reviewable by core i don't see a huge clamor for this in any event |
This could indeed be done through an ExtensionArray, but I would say that for that reason it could perfectly be done in an external package that provides this, instead of having it in pandas itself? |
@jorisvandenbossche Wouldn't pandas benefit from having native support for this? I imagine 3 scenarios with datetimes:
Some examples of realistic datasets where multiple timezones in 1 column could show up:
|
I also mentioned this in https://issues.apache.org/jira/browse/ARROW-16540, and the same applies for pandas: a mixture of datetimes with or without DST is considered as the same timezone.
To be clear, I am not saying that there are no use cases for this, or that it can't be useful for users of pandas. But not everything that is useful needs to be included in pandas itself. It is always a trade-off between including something in pandas vs having a third party package that provides additional functionality on top of pandas. |
@jreback @jorisvandenbossche I was running the above example with Python 3.8.12 and pandas 1.2.2
And it returns
In pandas 1.4.3 it is now stored as an
The change appears to have occurred between Would you know why the |
Is your feature request related to a problem?
I wish I could use pandas to store a column of timezone-aware datetime values with different timezones in a series with a
datetime64
dtype. In certain applications it is desirable to perform operations on all columns of a certain type, and currently a column with mixed types gets stored asobject
which makes it difficult to programmatically identify the column as containing datetime values based on the dtype and theobject
dtype prevents doing things like accessing the day of the datetime with the.dt
accessor.Describe the solution you'd like
I would like to have the ability to store a series of timezone aware values with mixed timezones and use the
.dt
accessor to access the underlying datetime components:API breaking implications
None that I'm aware of.
Describe alternatives you've considered
Instead of using the
.dt
accessor on the series, one could useapply
with a lambda function (or other function) to get at the underlying date components, but this does not address the fact that the series is not stored with adatetime
dtype, making it more difficult to determine that the datetime operations could/should be applied to the column.The text was updated successfully, but these errors were encountered: