Skip to content

BUG: groupby.first() corrupts timezone #12716

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jbandlow opened this issue Mar 25, 2016 · 3 comments
Closed

BUG: groupby.first() corrupts timezone #12716

jbandlow opened this issue Mar 25, 2016 · 3 comments
Labels
Bug Groupby Timezones Timezone data dtype
Milestone

Comments

@jbandlow
Copy link
Contributor

Code Sample, a copy-pastable example if possible

import pandas as pd
df = pd.DataFrame([{'ts': pd.Timestamp('2016-01-01', tz='America/Los_Angeles'), 'a': 1}])
df.groupby('a').first()

The output is

                         ts
a                          
1 2016-01-01 08:00:00-08:00

Expected Output

                         ts
a                          
1 2016-01-01 00:00:00-08:00

Note that the issue is with first() and not groupby:

for _, group in df.groupby('a'):
    print(group.ix[0])

gives the correct output of

a                             1
ts    2016-01-01 00:00:00-08:00
Name: 0, dtype: object

output of pd.show_versions()

pd.show_versions()

INSTALLED VERSIONS
------------------
commit: None
python: 2.7.6.final.0
python-bits: 64
OS: Linux
OS-release: 3.16.0-60-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8

pandas: 0.18.0
nose: 1.3.1
pip: 1.5.4
setuptools: 3.3
Cython: None
numpy: 1.10.4
scipy: 0.13.3
statsmodels: 0.5.0
xarray: None
IPython: 4.0.0
sphinx: None
patsy: 0.2.1
dateutil: 2.5.1
pytz: 2016.3
blosc: None
bottleneck: None
tables: 3.1.1
numexpr: 2.2.2
matplotlib: 1.3.1
openpyxl: 1.7.0
xlrd: 0.9.2
xlwt: 0.7.5
xlsxwriter: None
lxml: 3.3.3
bs4: 4.2.1
html5lib: 0.999
httplib2: 0.8
apiclient: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.8
boto: None
@jreback jreback added this to the 0.18.1 milestone Mar 25, 2016
@jreback
Copy link
Contributor

jreback commented Mar 25, 2016

so this is straightforward to fix, though a bit down in the weeds:

right now we pass the block.values to self.grouper.aggregate here. If instead we pass the block, then in the actual impl here. we can call block._try_coerce_args(...) which will return something that we can operate on.

The reason for this is that we are already callin block._try_coerce_results(...) on the returned data to coerce back to the original dtype.

This may cause other breakages, so may have to modify things a bit.

pull-requests welcome!

@sinhrks
Copy link
Member

sinhrks commented Apr 3, 2016

Maybe the same as #12619.

@jreback
Copy link
Contributor

jreback commented Apr 6, 2016

closing as dupe of #10668

@jreback jreback closed this as completed Apr 6, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Groupby Timezones Timezone data dtype
Projects
None yet
Development

No branches or pull requests

3 participants