groupby UTC timestamp aggregation #11616

alexandreyc · 2015-11-16T16:32:52Z

Hi all,

I've found an inconsistency between pandas 0.17 and 0.16.2 when aggregating on UTC timestamps. Here is a snippet to reproduce the problem:

import numpy as np
import pandas as pd

np.random.seed(42)

data = pd.DataFrame({
    'factor': np.random.randint(0, 3, size=60),
    'time': pd.date_range('01/01/2000 00:00', periods=60, freq='s', tz='UTC')
})

gp = data.groupby('factor')

print(gp['time'].min())
print(gp['time'].max())

On 0.16.2 the output seems correct, i.e it returns timestamps:

In [1]: %run bug_pandas.py
factor
0    2000-01-01 00:00:01+00:00
1    2000-01-01 00:00:07+00:00
2    2000-01-01 00:00:00+00:00
Name: time, dtype: object
factor
0    2000-01-01 00:00:57+00:00
1    2000-01-01 00:00:54+00:00
2    2000-01-01 00:00:59+00:00
Name: time, dtype: object

However on 0.17 it returns timestamps as integers:

In [1]: %run bug_pandas.py
factor
0    946684801000000000
1    946684807000000000
2    946684800000000000
Name: time, dtype: int64
factor
0    946684857000000000
1    946684854000000000
2    946684859000000000
Name: time, dtype: int64

It should be noted that the problem doesn't appear with tz=None.

Thanks for your help,

Alexandre

The text was updated successfully, but these errors were encountered:

jreback · 2015-11-16T20:45:42Z

so this works w/o the selection. I'll mark it, and if you'd like to dig in, pull-requests are welcome!

In [16]: data.groupby('factor').max()
Out[16]: 
                            time
factor                          
0      2000-01-01 00:00:59+00:00
1      2000-01-01 00:00:54+00:00
2      2000-01-01 00:00:58+00:00

In [17]: data.groupby('factor')['time'].max()
Out[17]: 
factor
0    946684859000000000
1    946684854000000000
2    946684858000000000
Name: time, dtype: int64

jreback · 2015-11-27T13:40:10Z

closed by #11672

jreback added Bug Groupby Difficulty Novice Regression Functionality that used to work in a prior pandas version Timezones Timezone data dtype labels Nov 16, 2015

jreback added this to the Next Major Release milestone Nov 16, 2015

varunkumar-dev mentioned this issue Nov 23, 2015

BUG: GH11616 fixes timezone selection error #11672

Closed

jreback modified the milestones: 0.18.0, Next Major Release Nov 24, 2015

jreback pushed a commit that referenced this issue Nov 27, 2015

BUG: fixes timezone selection error, #11616 & timezone info lost, #11682

e838266

jreback closed this as completed Nov 27, 2015

jorisvandenbossche mentioned this issue Feb 19, 2016

datetime gets converted to int64 in groupby agg #12394

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

groupby UTC timestamp aggregation #11616

groupby UTC timestamp aggregation #11616

alexandreyc commented Nov 16, 2015

jreback commented Nov 16, 2015

jreback commented Nov 27, 2015

groupby UTC timestamp aggregation #11616

groupby UTC timestamp aggregation #11616

Comments

alexandreyc commented Nov 16, 2015

jreback commented Nov 16, 2015

jreback commented Nov 27, 2015