Skip to content

df.groupby(col).resample('x') returning both unstacked or multiindex result #13255

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Tracked by #13056
mikepqr opened this issue May 23, 2016 · 3 comments
Closed
Tracked by #13056
Labels
Groupby Resample resample method

Comments

@mikepqr
Copy link

mikepqr commented May 23, 2016

import pandas as pd
from itertools import cycle, islice
N = 4
df = pd.DataFrame(index=pd.date_range('2000', periods=N))
df['col1'] = list(islice(cycle(['A', 'B']), N))
df['col2'] = list(islice(cycle(['a', 'b', 'c']), N))
print(df)

returns

           col1 col2
2000-01-01    A    a
2000-01-02    B    b
2000-01-03    A    c
2000-01-04    B    a

If I then do the following groupbys, lines 2 and 4 return the same result (a multiindex), but line 3 returns the result of running .unstack on line 1.

print(df.groupby(['col1', pd.TimeGrouper('W')]).size())  # 1. unstacked
print(df.groupby(['col2', pd.TimeGrouper('W')]).size())  # 2. unstacked
print(df.groupby('col1').resample('W').size())  # 3. multiindex
print(df.groupby('col2').resample('W').size())  # 4. unstacked
col1
A     2000-01-02    1
      2000-01-09    1
B     2000-01-02    1
      2000-01-09    1
dtype: int64
col2
a     2000-01-02    1
      2000-01-09    1
b     2000-01-02    1
c     2000-01-09    1
dtype: int64
      2000-01-02  2000-01-09
col1
A              1           1
B              1           1
col2
a     2000-01-02    1
      2000-01-09    1
b     2000-01-02    1
c     2000-01-09    1
dtype: int64

This seems like an inconsistency to me, in the sense that, if 1 and 3 do not return the same result, neither should 2 and 4. Is this a bug, or is df.groupby(col).resample('x') supposed to behave like this, and is it supposed to behave differently to df.groubpy[(col, pd.TimeGrouper(x)])?

Note that in df col1 has two groups of identical size, while col2 has unequal sized groups. I'm not sure if that's the problem, but this inconsistency goes away for some choices of N (e.g. all four lines above return multiindex for N=3 and N=24?!)

print(df.groupby('col1').size())
print(df.groupby('col2').size())
col1
A    2
B    2
dtype: int64
col2
a    2
b    1
c    1
dtype: int64
@jreback
Copy link
Contributor

jreback commented May 23, 2016

show pd.show_versions()

@mikepqr
Copy link
Author

mikepqr commented May 23, 2016

INSTALLED VERSIONS
------------------
commit: None
python: 3.5.0.final.0
python-bits: 64
OS: Darwin
OS-release: 15.4.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8

pandas: 0.18.1
nose: 1.3.7
pip: 8.1.2
setuptools: 19.6.2
Cython: 0.23.4
numpy: 1.11.0
scipy: 0.17.0
statsmodels: 0.6.1
xarray: None
IPython: 4.0.3
sphinx: 1.4.1
patsy: 0.4.1
dateutil: 2.5.3
pytz: 2016.4
blosc: None
bottleneck: None
tables: None
numexpr: 2.5
matplotlib: 1.5.1
openpyxl: None
xlrd: 0.9.4
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: 1.0.11
pymysql: None
psycopg2: None
jinja2: 2.8
boto: 2.39.0
pandas_datareader: None

@jreback
Copy link
Contributor

jreback commented May 23, 2016

this is a manifestation of #13056

I think we will have to fix this as it IS a bit inconsistent.

but will be tracked in that issue. thanks for the report (and test cases)

@jreback jreback added Groupby Resample resample method labels May 23, 2016
@jreback jreback added this to the 0.19.0 milestone May 23, 2016
@jreback jreback closed this as completed May 23, 2016
@jorisvandenbossche jorisvandenbossche modified the milestones: No action, 0.20.0 Jul 8, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Groupby Resample resample method
Projects
None yet
Development

No branches or pull requests

3 participants