Skip to content

Series (but not DataFrame) combine_first() loses timezone information #21469

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Liam3851 opened this issue Jun 13, 2018 · 4 comments · Fixed by #21660
Closed

Series (but not DataFrame) combine_first() loses timezone information #21469

Liam3851 opened this issue Jun 13, 2018 · 4 comments · Fixed by #21660
Labels
Bug Indexing Related to indexing on series/frames, not to indexes themselves MultiIndex Timezones Timezone data dtype
Milestone

Comments

@Liam3851
Copy link
Contributor

Code Sample, a copy-pastable example if possible

dts1 = pd.date_range('20150101','20150105',tz='UTC')  
df1 = pd.DataFrame({'DATE':dts1})                     
dts2 = pd.date_range('20150103','20150105',tz='UTC')  
df2 = pd.DataFrame({'DATE':dts2})                     
df = df1.combine_first(df2)                           
df.DATE[0].tz # returns<UTC>, 10567 fixed

ser = df1['DATE'].combine_first(df2['DATE'])
ser[0].tz  # returns None, should be <UTC> as above

Problem description

Calling Series.combine_first on two tz-localized datetime Series returns a non-localized Series.

#10567 handled the case when running DataFrame.combine_first on DataFrames with datetime tz columns. Oddly, this does not work for Series. This behavior is the same under at least both 0.19.2 and latest master so it appears it may never have been fixed with #10567.

Output of pd.show_versions()

INSTALLED VERSIONS ------------------ commit: 576d5c6 python: 3.6.5.final.0 python-bits: 64 OS: Windows OS-release: 7 machine: AMD64 processor: Intel64 Family 6 Model 62 Stepping 4, GenuineIntel byteorder: little LC_ALL: None LANG: None LOCALE: None.None

pandas: 0.24.0.dev0+103.g576d5c6b7
pytest: 3.6.0
pip: 10.0.1
setuptools: 39.2.0
Cython: 0.28.3
numpy: 1.14.2
scipy: 1.0.0
pyarrow: 0.8.0
xarray: 0.10.6
IPython: 6.4.0
sphinx: 1.7.5
patsy: 0.5.0
dateutil: 2.7.3
pytz: 2018.4
blosc: None
bottleneck: 1.2.1
tables: 3.4.3
numexpr: 2.6.4
feather: 0.4.0
matplotlib: 2.2.2
openpyxl: 2.5.3
xlrd: 1.1.0
xlwt: 1.3.0
xlsxwriter: 1.0.5
lxml: 4.1.1
bs4: 4.6.0
html5lib: 1.0.1
sqlalchemy: 1.2.8
pymysql: 0.8.1
psycopg2: None
jinja2: 2.10
s3fs: 0.1.5
fastparquet: 0.1.5
pandas_gbq: None
pandas_datareader: None

@gfyoung gfyoung added Bug Indexing Related to indexing on series/frames, not to indexes themselves Timezones Timezone data dtype MultiIndex labels Jun 13, 2018
@gfyoung
Copy link
Member

gfyoung commented Jun 13, 2018

I'm +1 for consistency. Investigation and PR are welcome!

@Liam3851
Copy link
Contributor Author

Hmm.. looks like this might be the result of a more general issue with where when other is a Series-- we're losing type info. combine_first appears to be delegating its implementation to where's internals.

dts1 = pd.date_range('20150101','20150105',tz='UTC')  
df1 = pd.DataFrame({'date':dts1})                     
dts2 = pd.date_range('20150103','20150107',tz='UTC')  
df2 = pd.DataFrame({'date':dts2})                     
df1.date.where(df1.date < df1.date[3], df2.date)

Out[42]:
0    2015-01-01 00:00:00+00:00
1    2015-01-02 00:00:00+00:00
2    2015-01-03 00:00:00+00:00
3          1420502400000000000
4          1420588800000000000
Name: date, dtype: object

@Liam3851
Copy link
Contributor Author

Actually it looks like there where issue might only be tangentially related... Series.combine_first refers to pd.core.common._where_compat, but despite the name _where_compat is not referenced in where.

@mroeschke
Copy link
Member

If I were to guess, this may be a problem in the where property defined in the SingleBlockManager

def where(self, other, cond, align=True, errors='raise',

In general, data is operated as numpy arrays and therefore tz information will be discarded (and not appropriated considered when remerging data)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Indexing Related to indexing on series/frames, not to indexes themselves MultiIndex Timezones Timezone data dtype
Projects
None yet
4 participants