Skip to content

Timezone dropped when dataframes are joined after resampling #13783

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
giogix2 opened this issue Jul 25, 2016 · 10 comments · Fixed by #21674
Closed

Timezone dropped when dataframes are joined after resampling #13783

giogix2 opened this issue Jul 25, 2016 · 10 comments · Fixed by #21674
Labels
Reshaping Concat, Merge/Join, Stack/Unstack, Explode Timezones Timezone data dtype
Milestone

Comments

@giogix2
Copy link

giogix2 commented Jul 25, 2016

I've already explained the problem link.
When I merge 2 dataframes, using join() or concat() (I didn't try with merge) and one of these has been previously resampled, averaging the values each 15 minutes, i loose the timezone

Code Sample, a copy-pastable example if possible

df1
2016-07-05 11:30:00+01:00    -100.81
2016-07-05 11:31:00+01:00    -99.34
2016-07-05 11:32:00+01:00    -95.09
..............
..............
2016-07-05 11:45:00+01:00    -83.62
2016-07-05 11:46:00+01:00    -1.57
2016-07-05 11:47:00+01:00    21.01
..............
df2['column1']
2016-07-05 11:30:00+01:00    -79,45
2016-07-05 11:45:00+01:00    -51.11
2016-07-05 12:00:00+01:00    -12.67
2016-07-05 12:15:00+01:00    15.21
..........
print df1.index
DatetimeIndex(['2016-07-05 11:30:00+01:00', '2016-07-05 11:31:00+01:00',
               '2016-07-05 11:32:00+01:00', '2016-07-05 11:33:00+01:00',
                ...
               '2016-07-19 14:30:00+01:00', '2016-07-19 14:31:00+01:00'],
               dtype='datetime64[ns, Europe/London]', length=1358, freq=None)

print df2.index
DatetimeIndex(['2016-07-05 11:30:00+01:00', '2016-07-05 11:45:00+01:00',
               '2016-07-05 12:00:00+01:00', '2016-07-05 12:15:00+01:00',
                ...
               '2016-07-19 14:30:00+01:00', '2016-07-19 14:45:00+01:00'],
               dtype='datetime64[ns, Europe/London]', length=1358, freq=None)

df1 = df1.resample('15Min').mean()

print df1.index
DatetimeIndex(['2016-07-05 11:30:00+01:00', '2016-07-05 11:45:00+01:00',
           '2016-07-05 12:00:00+01:00', '2016-07-05 12:15:00+01:00',
            ...
           '2016-07-19 14:30:00+01:00'],
           dtype='datetime64[ns, Europe/London]', length=1358, freq='15T')

df = pd.concat([df1, df2], axis=1)

print df.index
DatetimeIndex(['2016-07-05 11:30:00+01:00', '2016-07-05 11:45:00+01:00',
               '2016-07-05 12:00:00+01:00', '2016-07-05 12:15:00+01:00',
                ...
               '2016-07-19 14:30:00+01:00'],
               dtype='datetime64[ns, UTC]', length=1358, freq='15T')

Expected Output

print df.index
DatetimeIndex(['2016-07-05 11:30:00+01:00', '2016-07-05 11:45:00+01:00',
               '2016-07-05 12:00:00+01:00', '2016-07-05 12:15:00+01:00',
                ...
               '2016-07-19 14:30:00+01:00'],
               dtype='datetime64[ns, Europe/London]', length=1358, freq='15T')

The output looses the timezone and it comes back to 'UTC'.
I think the problem is related to this issue, where the same problem comes out when aggregating.

output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 2.7.11.final.0
python-bits: 64
OS: Linux
OS-release: 4.4.0-28-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8

pandas: 0.18.1
nose: None
pip: 8.1.2
setuptools: 24.0.1
Cython: None
numpy: 1.11.1
scipy: 0.17.1
statsmodels: None
xarray: None
IPython: 4.2.1
sphinx: None
patsy: None
dateutil: 2.5.3
pytz: 2016.4
blosc: None
bottleneck: None
tables: None
numexpr: None
matplotlib: 1.5.1
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: 1.0.13
pymysql: None
psycopg2: None
jinja2: 2.8
boto: None
pandas_datareader: None

@jreback
Copy link
Contributor

jreback commented Jul 25, 2016

pls show a copy-paste reproducible example, simple examples work just fine in 0.18.1 (and master).

In [4]: df1 = pd.DataFrame({'A' : [1,2,3]}, index=pd.date_range('20160101',periods=3,tz='Europe/London', freq='15T'))

In [5]: df2 = pd.DataFrame({'B' : [4,5,6]}, index=pd.date_range('20160102',periods=3,tz='Europe/London', freq='15T'))

In [6]: pd.concat([df1, df2], axis=1).index
Out[6]: DatetimeIndex(['2016-01-01 00:00:00+00:00', '2016-01-01 00:15:00+00:00', '2016-01-01 00:30:00+00:00', '2016-01-02 00:00:00+00:00', '2016-01-02 00:15:00+00:00', '2016-01-02 00:30:00+00:00'], dtype='datetime64[ns, Europe/London]', freq=None)

In [7]: pd.concat([df1, df2], axis=1)
Out[7]: 
                             A    B
2016-01-01 00:00:00+00:00  1.0  NaN
2016-01-01 00:15:00+00:00  2.0  NaN
2016-01-01 00:30:00+00:00  3.0  NaN
2016-01-02 00:00:00+00:00  NaN  4.0
2016-01-02 00:15:00+00:00  NaN  5.0
2016-01-02 00:30:00+00:00  NaN  6.0

@jreback
Copy link
Contributor

jreback commented Jul 25, 2016

might be related to #7795 but only tangentially

@jreback jreback added Reshaping Concat, Merge/Join, Stack/Unstack, Explode Timezones Timezone data dtype Can't Repro labels Jul 25, 2016
@giogix2
Copy link
Author

giogix2 commented Jul 25, 2016

Here's a copy paste reproducible example:

import pandas as pd

df1 = pd.DataFrame({'A' : [1,2,3,4,5,6,7,8,9,10]}, index=pd.date_range('20160101', periods=10, freq='1T'))
df2 = pd.DataFrame({'B' : [1,2,3,4,5]}, index=pd.date_range('20160101', periods=5, freq='2T'))

times = df1.index.tz_localize('UTC').tz_convert('Europe/London')
df1 = df1.set_index(times)
times = df2.index.tz_localize('UTC').tz_convert('Europe/London')
df2 = df2.set_index(times)

df1 = df1.resample('2T').mean()
df = df2.join(df1)
print df1.index

The output is:

DatetimeIndex(['2016-01-01 00:00:00+00:00', '2016-01-01 00:02:00+00:00',
               '2016-01-01 00:04:00+00:00', '2016-01-01 00:06:00+00:00',
               '2016-01-01 00:08:00+00:00'],
              dtype='datetime64[ns, UTC]', freq='2T')

Expected Output

DatetimeIndex(['2016-01-01 00:00:00+00:00', '2016-01-01 00:02:00+00:00',
               '2016-01-01 00:04:00+00:00', '2016-01-01 00:06:00+00:00',
               '2016-01-01 00:08:00+00:00'],
              dtype='datetime64[ns, Europe/London]', freq='2T')

Should I directly edit the main post I wrote?

@giogix2
Copy link
Author

giogix2 commented Jul 25, 2016

Anyway it looks to me exactly the same problem you've linked (#7795).

@sinhrks
Copy link
Member

sinhrks commented Jul 25, 2016

This looks work on current master? Though tests are needed... (I'm going to cover it on #13660)

@sinhrks
Copy link
Member

sinhrks commented Jul 25, 2016

ah example is incorrect. maybe:

df1 = df1.resample('2T').mean()
df = df2.join(df1)
print df.index
# DatetimeIndex(['2016-01-01 00:00:00+00:00', '2016-01-01 00:02:00+00:00',
#                '2016-01-01 00:04:00+00:00', '2016-01-01 00:06:00+00:00',
#                '2016-01-01 00:08:00+00:00'],
#               dtype='datetime64[ns, UTC]', freq='2T')

@sinhrks
Copy link
Member

sinhrks commented Jul 25, 2016

This is not fixed #13660 and looks different from #7795. This is for join, not concat. Fixing the issue title.

@sinhrks sinhrks changed the title Timezone dropped when dataframes are concatenated after resampling Timezone dropped when dataframes are joined after resampling Jul 25, 2016
@giogix2
Copy link
Author

giogix2 commented Jul 26, 2016

The point is, as I said in the main post, that this problem happens also with the concat().
Here's the code with the concat():

import pandas as pd

df1 = pd.DataFrame({'A' : [1,2,3,4,5,6,7,8,9,10]}, index=pd.date_range('20160101', periods=10, freq='1T'))
df2 = pd.DataFrame({'B' : [1,2,3,4,5]}, index=pd.date_range('20160101', periods=5, freq='2T'))

times = df1.index.tz_localize('UTC').tz_convert('Europe/London')
df1 = df1.set_index(times)
times = df2.index.tz_localize('UTC').tz_convert('Europe/London')
df2 = df2.set_index(times)

df1 = df1.resample('2T').mean()
df = pd.concat([df1, df2])
print df.index

The output is the same as with the join():

DatetimeIndex(['2016-01-01 00:00:00+00:00', '2016-01-01 00:02:00+00:00',
               '2016-01-01 00:04:00+00:00', '2016-01-01 00:06:00+00:00',
               '2016-01-01 00:08:00+00:00', '2016-01-01 00:00:00+00:00',
               '2016-01-01 00:02:00+00:00', '2016-01-01 00:04:00+00:00',
               '2016-01-01 00:06:00+00:00', '2016-01-01 00:08:00+00:00'],
              dtype='datetime64[ns, UTC]', freq=None)

Should this be reported in another "issue"?

@sinhrks
Copy link
Member

sinhrks commented Jul 26, 2016

No new issue is needed, as concat is being fixed by #13660. Nice if you can test the branch.

@jreback
Copy link
Contributor

jreback commented Nov 29, 2017

I believe this issue is fixed, someone want to do a PR with a validation test?

@jreback jreback added this to the 0.22.0 milestone Nov 29, 2017
@jreback jreback modified the milestones: 0.23.0, Next Major Release Apr 14, 2018
@jreback jreback modified the milestones: Next Major Release, 0.24.0 Jul 2, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Reshaping Concat, Merge/Join, Stack/Unstack, Explode Timezones Timezone data dtype
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants