Skip to content

datetime gets converted to int64 in groupby agg #12394

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
twiecki opened this issue Feb 19, 2016 · 4 comments
Closed

datetime gets converted to int64 in groupby agg #12394

twiecki opened this issue Feb 19, 2016 · 4 comments
Labels
Bug Duplicate Report Duplicate issue or pull request Groupby Timezones Timezone data dtype
Milestone

Comments

@twiecki
Copy link
Contributor

twiecki commented Feb 19, 2016

I tried to produce a stand-alone example but couldn't so far. Maybe the answer is already apparent though. This started happening just recently (maybe after updated pandas to 0.17.1) but worked well before. I'm doing a multi-column agg in a groupby. One of the columns is a datetime of which I want the first element:

(Pdb) t.groupby(['block_dir']).first()
                                 dt    sid  amount     price symbol  \
block_dir                                                             
1         2003-01-02 15:56:00+00:00  21719     -62  0.963811   AIRN   

Works fine, however:

          order_sign  block_time  
block_dir                         
1              False           0  
(Pdb) t.groupby(['block_dir']).agg({'dt': 'first'})
                            dt
block_dir                     
1          1041522960000000000

Expected Output

output of pd.show_versions()

INSTALLED VERSIONS
------------------
commit: None
python: 2.7.11.final.0
python-bits: 64
OS: Linux
OS-release: 3.13.0-77-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: en_US.UTF-8
LANG: en_US.UTF-8

pandas: 0.17.1
nose: 1.3.7
pip: 8.0.2
setuptools: 19.6.2
Cython: 0.23.4
numpy: 1.10.4
scipy: 0.16.1
statsmodels: 0.6.1
IPython: 3.2.1
sphinx: 1.3.1
patsy: 0.4.0
dateutil: 2.4.2
pytz: 2015.4
blosc: None
bottleneck: 1.0.0
tables: 3.2.2
numexpr: 2.4.3
matplotlib: 1.4.3
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: 4.3.2
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: 1.0.8
pymysql: None
psycopg2: 2.6.1 (dt dec pq3 ext)
Jinja2: None
@jorisvandenbossche
Copy link
Member

This reproduces it (it is the timezone that is the trigger):

In [1]: pd.__version__
Out[1]: u'0.17.1'

In [2]: df = pd.DataFrame({'col':['a','a','b','b'], 'date':pd.date_range('2015-0
1-01', periods=4, utc=True)})

In [3]: df.groupby('col').first()
Out[3]:
          date
col
a   2015-01-01
b   2015-01-03

In [4]: df.groupby('col').agg({'date': 'first'})
Out[4]:
          date
col
a   2015-01-01
b   2015-01-03

In [5]: df['date'] = df['date'].dt.tz_localize('utc')

In [6]: df.groupby('col').first()
Out[6]:
                         date
col
a   2015-01-01 00:00:00+00:00
b   2015-01-03 00:00:00+00:00

In [7]: df.groupby('col').agg({'date': 'first'})
Out[7]:
                    date
col
a    1420070400000000000
b    1420243200000000000

@twiecki
Copy link
Contributor Author

twiecki commented Feb 19, 2016

Thanks @jorisvandenbossche.

@jorisvandenbossche
Copy link
Member

And it seems a duplicate of #11616, fixed in master by #11672

@jorisvandenbossche jorisvandenbossche added this to the 0.18.0 milestone Feb 19, 2016
@jorisvandenbossche jorisvandenbossche added the Duplicate Report Duplicate issue or pull request label Feb 19, 2016
@twiecki
Copy link
Contributor Author

twiecki commented Feb 19, 2016

I'll close it then.

@twiecki twiecki closed this as completed Feb 19, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Duplicate Report Duplicate issue or pull request Groupby Timezones Timezone data dtype
Projects
None yet
Development

No branches or pull requests

2 participants