Skip to content

ENH: Allow storing timezone-aware datetimes in a series with a datetime64 dtype #46998

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
thehomebrewnerd opened this issue May 11, 2022 · 5 comments
Labels
Enhancement ExtensionArray Extending pandas with custom dtypes or arrays. Needs Discussion Requires discussion from core team before further action Timezones Timezone data dtype

Comments

@thehomebrewnerd
Copy link

Is your feature request related to a problem?

I wish I could use pandas to store a column of timezone-aware datetime values with different timezones in a series with a datetime64 dtype. In certain applications it is desirable to perform operations on all columns of a certain type, and currently a column with mixed types gets stored as object which makes it difficult to programmatically identify the column as containing datetime values based on the dtype and the object dtype prevents doing things like accessing the day of the datetime with the .dt accessor.

Describe the solution you'd like

I would like to have the ability to store a series of timezone aware values with mixed timezones and use the .dt accessor to access the underlying datetime components:

mixed_tz_series = pd.Series([
    pd.to_datetime("2018-03-01").tz_localize(tz="US/Pacific"),
    pd.to_datetime("2018-03-01").tz_localize(tz="US/Central"),
    pd.to_datetime("2018-03-01").tz_localize(tz="Europe/Vienna"),
], dtype="datetime64[ns]")

mixed_tz_series.dt.day

API breaking implications

None that I'm aware of.

Describe alternatives you've considered

Instead of using the .dt accessor on the series, one could use apply with a lambda function (or other function) to get at the underlying date components, but this does not address the fact that the series is not stored with a datetime dtype, making it more difficult to determine that the datetime operations could/should be applied to the column.

@jreback
Copy link
Contributor

jreback commented May 11, 2022

this would require a dedicated extension type - so it's possible

certainly a well tested community provided PR would be reviewable by core

i don't see a huge clamor for this in any event

@simonjayhawkins simonjayhawkins added Timezones Timezone data dtype ExtensionArray Extending pandas with custom dtypes or arrays. and removed Needs Triage Issue that has not been reviewed by a pandas team member labels May 12, 2022
@simonjayhawkins simonjayhawkins added this to the Contributions Welcome milestone May 12, 2022
@jorisvandenbossche
Copy link
Member

This could indeed be done through an ExtensionArray, but I would say that for that reason it could perfectly be done in an external package that provides this, instead of having it in pandas itself?

@mroeschke mroeschke added the Needs Discussion Requires discussion from core team before further action label May 16, 2022
@gsheni
Copy link
Contributor

gsheni commented May 16, 2022

@jorisvandenbossche Wouldn't pandas benefit from having native support for this? I imagine 3 scenarios with datetimes:

  1. datetimes values with no timezone info (timezone naive)
  2. datetimes values with timezone info (timezone aware)
    a. all the same timezone
    b. different timezones

Some examples of realistic datasets where multiple timezones in 1 column could show up:

  1. A dataset with taxi trips in a city that includes a column of pickup_datetimes.
    a. A taxi rider is picked up in PDT, dropped off, and another taxi rider is picked up in EDT

  2. A dataset where you record business opening times for 1 year:
    a. You could have 1 column with EDT and EST datetimes

@jorisvandenbossche
Copy link
Member

a. You could have 1 column with EDT and EST datetimes

I also mentioned this in https://issues.apache.org/jira/browse/ARROW-16540, and the same applies for pandas: a mixture of datetimes with or without DST is considered as the same timezone.
(although in pandas we don't have a method like "is_dst" to know which values in the column are using DST, but that could be a feature request).

Wouldn't pandas benefit from having native support for this?

To be clear, I am not saying that there are no use cases for this, or that it can't be useful for users of pandas. But not everything that is useful needs to be included in pandas itself. It is always a trade-off between including something in pandas vs having a third party package that provides additional functionality on top of pandas.

@cp2boston
Copy link

@jreback @jorisvandenbossche I was running the above example with Python 3.8.12 and pandas 1.2.2

mixed_tz_series = pd.Series([
    pd.to_datetime("2018-03-01 09:25:00").tz_localize(tz="US/Eastern"),
    pd.to_datetime("2018-03-01  09:25:00").tz_localize(tz="US/Pacific"),
    pd.to_datetime("2018-03-01  09:25:00").tz_localize(tz="US/Central"),
    pd.to_datetime("2018-03-01  09:25:00").tz_localize(tz="Europe/Vienna"),
], dtype="datetime64[ns]")

print(mixed_tz_series._data)

And it returns

SingleBlockManager
Items: RangeIndex(start=0, stop=4, step=1)
DatetimeBlock: 4 dtype: datetime64[ns]

In pandas 1.4.3 it is now stored as an object.

SingleBlockManager
Items: RangeIndex(start=0, stop=4, step=1)
ObjectBlock: 4 dtype: object

The change appears to have occurred between 1.2.5 (still datetime64[ns]) and 1.3.0 (now an object).

Would you know why the dtype was changed to an object, even though the dtype itself is specified?

@mroeschke mroeschke removed this from the Contributions Welcome milestone Oct 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement ExtensionArray Extending pandas with custom dtypes or arrays. Needs Discussion Requires discussion from core team before further action Timezones Timezone data dtype
Projects
None yet
Development

No branches or pull requests

7 participants