You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When aggregating a column with dtype datetime64[ns, timezone], pandas does not handle properly tz information. It looks like as if pandas was dropping the timezone info (ie doing its calculation on the UTC values) and then localizing the date to the original timezone,i.e. doing something like column_with_UTC_dates_in_naive_format.tz_localize(timezone)
instead of column_with_UTC_dates_in_naive_format.tz_localize("UTC").tz_convert(timezone)
This buggy behavior is even more striking when running the sample with idx = pandas.date_range("2016-03-27", "2016-03-29", freq="H", closed="left", tz="Europe/Brussels") (cover the DST change) as the naive UTC date "2016-03-27 03:00" cannot be localized to the timezone as it is not recognized. In this case, the dtype of the column is not anymore datetime64[ns, timezone] but just int64.
Code Sample, a copy-pastable example if possible
# create a DataFrame with a "time" column filled with localized datetime64 on hourly basis
idx = pandas.date_range("2016-01-01", "2016-01-03", freq="H", closed="left",tz="Europe/Brussels")
df = pandas.Series(idx, index=idx, name="time").to_frame()
# calculate some min and max of "time" column per hour of the day
df_agg = df.groupby(idx.hour).aggregate(["min","max"])
df_agg.info()
print df_agg
so your example is confusing, because the expected output is what you are getting, not what you EXPECT to get. can you update. I would also have a much simpler example which illustrates.
When aggregating a column with dtype datetime64[ns, timezone], pandas does not handle properly tz information. It looks like as if pandas was dropping the timezone info (ie doing its calculation on the UTC values) and then localizing the date to the original timezone,i.e. doing something like
column_with_UTC_dates_in_naive_format.tz_localize(timezone)
instead of
column_with_UTC_dates_in_naive_format.tz_localize("UTC").tz_convert(timezone)
This buggy behavior is even more striking when running the sample with
idx = pandas.date_range("2016-03-27", "2016-03-29", freq="H", closed="left", tz="Europe/Brussels")
(cover the DST change) as the naive UTC date "2016-03-27 03:00" cannot be localized to the timezone as it is not recognized. In this case, the dtype of the column is not anymore datetime64[ns, timezone] but just int64.Code Sample, a copy-pastable example if possible
Expected Output
output of
pd.show_versions()
INSTALLED VERSIONS
commit: None
python: 2.7.11.final.0
python-bits: 32
OS: Windows
OS-release: 7
machine: AMD64
processor: Intel64 Family 6 Model 61 Stepping 4, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
pandas: 0.18.0
nose: 1.3.7
pip: 8.1.1
setuptools: 20.6.7
Cython: None
numpy: 1.10.4
scipy: 0.17.0
statsmodels: 0.6.1
xarray: None
IPython: None
sphinx: None
patsy: 0.4.0
dateutil: 2.5.0
pytz: 2016.1
blosc: None
bottleneck: None
tables: 3.2.2
numexpr: 2.4.6
matplotlib: 1.5.1
openpyxl: 2.3.2
xlrd: 0.9.4
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: 1.0.12
pymysql: None
psycopg2: None
jinja2: 2.8
boto: None
None
The text was updated successfully, but these errors were encountered: