Skip to content

DataFrame[list[Series[datetime64ns, tz]]] drops timezone information #28552

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Koojav opened this issue Sep 20, 2019 · 5 comments · Fixed by #33905
Closed

DataFrame[list[Series[datetime64ns, tz]]] drops timezone information #28552

Koojav opened this issue Sep 20, 2019 · 5 comments · Fixed by #33905
Labels
good first issue Indexing Related to indexing on series/frames, not to indexes themselves Needs Tests Unit test(s) needed to prevent regressions Reshaping Concat, Merge/Join, Stack/Unstack, Explode Timezones Timezone data dtype
Milestone

Comments

@Koojav
Copy link

Koojav commented Sep 20, 2019

Code Sample, a copy-pastable example if possible

s = pd.Series([pd.to_datetime('2018-10-08 13:36:45+00:00')])


Output:
0   2018-10-08 13:36:45+00:00
dtype: datetime64[ns, UTC]
pd.DataFrame([s]).min()


Output:
0   2018-10-08 13:36:45
dtype: datetime64[ns]

Problem description

When using DataFrame().min() method timezone information gets removed but it should remain untouched just like when using Series().combine(...,min). Example:

s.combine(s, min)


Output:
0   2018-10-08 13:36:45+00:00
dtype: datetime64[ns, UTC]

Expected Output

0   2018-10-08 13:36:45+00:00
dtype: datetime64[ns, UTC]

Output of pd.show_versions()

INSTALLED VERSIONS

commit : None
python : 3.6.7.final.0
python-bits : 64
OS : Linux
OS-release : 4.15.0-64-generic
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8

pandas : 0.25.0
numpy : 1.17.0
pytz : 2019.2
dateutil : 2.8.0
pip : 19.2.1
setuptools : 40.6.3
Cython : 0.27.3
pytest : 3.0.7
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 4.2.6
html5lib : None
pymysql : None
psycopg2 : 2.7.3.2 (dt dec pq3 ext lo64)
jinja2 : 2.10
IPython : 7.2.0
pandas_datareader: None
bs4 : 4.6.3
bottleneck : None
fastparquet : None
gcsfs : None
lxml.etree : 4.2.6
matplotlib : 2.0.2
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : 0.11.0
pyarrow : 0.13.0
pytables : None
s3fs : None
scipy : 1.0.0
sqlalchemy : 1.3.7
tables : None
xarray : None
xlrd : None
xlwt : None
xlsxwriter : None

@randomstuff
Copy link
Contributor

randomstuff commented Sep 20, 2019

Actually, this is the DataFrame creation which is dropping timezone, not min():

import pandas as pd
s = pd.Series([pd.to_datetime('2018-10-08 13:36:45+00:00')])
print(pd.DataFrame([s])[0])
# => 0   2018-10-08 13:36:45
# => Name: 0, dtype: datetime64[ns]
print(pd.DataFrame([s])[0].dt.tz)
# => None

@randomstuff
Copy link
Contributor

But this works fine:

pd.DataFrame({"x":s}).min()
# => x   2018-10-08 13:36:45+00:00
# => dtype: datetime64[ns, UTC]

@mroeschke mroeschke changed the title DataFrame().min() removes timezone DataFrame[list[Series[datetime64ns-tz]]] drops timezone information Sep 20, 2019
@mroeschke mroeschke added Constructors Series/DataFrame/Index/pd.array Constructors DataFrame DataFrame data structure Timezones Timezone data dtype Bug labels Sep 20, 2019
@mroeschke mroeschke changed the title DataFrame[list[Series[datetime64ns-tz]]] drops timezone information DataFrame[list[Series[datetime64ns, tz]]] drops timezone information Sep 20, 2019
@mroeschke
Copy link
Member

This looks fixed on master and could use a test:

In [12]: print(pd.DataFrame([s])[0])
0   2018-10-08 13:36:45+00:00
Name: 0, dtype: datetime64[ns, UTC]

In [13]: print(pd.DataFrame([s])[0].dt.tz)
UTC

In [14]: pd.__version__
Out[14]: '1.1.0.dev0+1027.g767335719.dirty'

@mroeschke mroeschke added good first issue Needs Tests Unit test(s) needed to prevent regressions and removed Bug Constructors Series/DataFrame/Index/pd.array Constructors DataFrame DataFrame data structure Timezones Timezone data dtype labels Mar 30, 2020
@ghost
Copy link

ghost commented Apr 29, 2020

What timezone should be used for, let's say, mean if the series contains datetimes with different timezones?

@jreback
Copy link
Contributor

jreback commented Apr 29, 2020

you should convert to UTC if interoperating between time zones

@jreback jreback added this to the 1.1 milestone May 11, 2020
@jreback jreback added Timezones Timezone data dtype Indexing Related to indexing on series/frames, not to indexes themselves Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels May 11, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Indexing Related to indexing on series/frames, not to indexes themselves Needs Tests Unit test(s) needed to prevent regressions Reshaping Concat, Merge/Join, Stack/Unstack, Explode Timezones Timezone data dtype
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants