Skip to content

groupby UTC timestamp aggregation #11616

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
alexandreyc opened this issue Nov 16, 2015 · 2 comments
Closed

groupby UTC timestamp aggregation #11616

alexandreyc opened this issue Nov 16, 2015 · 2 comments
Labels
Bug Groupby Regression Functionality that used to work in a prior pandas version Timezones Timezone data dtype
Milestone

Comments

@alexandreyc
Copy link

Hi all,

I've found an inconsistency between pandas 0.17 and 0.16.2 when aggregating on UTC timestamps. Here is a snippet to reproduce the problem:

import numpy as np
import pandas as pd

np.random.seed(42)

data = pd.DataFrame({
    'factor': np.random.randint(0, 3, size=60),
    'time': pd.date_range('01/01/2000 00:00', periods=60, freq='s', tz='UTC')
})

gp = data.groupby('factor')

print(gp['time'].min())
print(gp['time'].max())

On 0.16.2 the output seems correct, i.e it returns timestamps:

In [1]: %run bug_pandas.py
factor
0    2000-01-01 00:00:01+00:00
1    2000-01-01 00:00:07+00:00
2    2000-01-01 00:00:00+00:00
Name: time, dtype: object
factor
0    2000-01-01 00:00:57+00:00
1    2000-01-01 00:00:54+00:00
2    2000-01-01 00:00:59+00:00
Name: time, dtype: object

However on 0.17 it returns timestamps as integers:

In [1]: %run bug_pandas.py
factor
0    946684801000000000
1    946684807000000000
2    946684800000000000
Name: time, dtype: int64
factor
0    946684857000000000
1    946684854000000000
2    946684859000000000
Name: time, dtype: int64

It should be noted that the problem doesn't appear with tz=None.

Thanks for your help,

Alexandre

@jreback
Copy link
Contributor

jreback commented Nov 16, 2015

so this works w/o the selection. I'll mark it, and if you'd like to dig in, pull-requests are welcome!

In [16]: data.groupby('factor').max()
Out[16]: 
                            time
factor                          
0      2000-01-01 00:00:59+00:00
1      2000-01-01 00:00:54+00:00
2      2000-01-01 00:00:58+00:00

In [17]: data.groupby('factor')['time'].max()
Out[17]: 
factor
0    946684859000000000
1    946684854000000000
2    946684858000000000
Name: time, dtype: int64

@jreback jreback added Bug Groupby Difficulty Novice Regression Functionality that used to work in a prior pandas version Timezones Timezone data dtype labels Nov 16, 2015
@jreback jreback added this to the Next Major Release milestone Nov 16, 2015
@jreback jreback modified the milestones: 0.18.0, Next Major Release Nov 24, 2015
@jreback
Copy link
Contributor

jreback commented Nov 27, 2015

closed by #11672

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Groupby Regression Functionality that used to work in a prior pandas version Timezones Timezone data dtype
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants