df.groupby(col).resample('x') returning both unstacked or multiindex result #13255

mikepqr · 2016-05-23T00:13:47Z

import pandas as pd
from itertools import cycle, islice
N = 4
df = pd.DataFrame(index=pd.date_range('2000', periods=N))
df['col1'] = list(islice(cycle(['A', 'B']), N))
df['col2'] = list(islice(cycle(['a', 'b', 'c']), N))
print(df)

returns

           col1 col2
2000-01-01    A    a
2000-01-02    B    b
2000-01-03    A    c
2000-01-04    B    a

If I then do the following groupbys, lines 2 and 4 return the same result (a multiindex), but line 3 returns the result of running .unstack on line 1.

print(df.groupby(['col1', pd.TimeGrouper('W')]).size())  # 1. unstacked
print(df.groupby(['col2', pd.TimeGrouper('W')]).size())  # 2. unstacked
print(df.groupby('col1').resample('W').size())  # 3. multiindex
print(df.groupby('col2').resample('W').size())  # 4. unstacked

col1
A     2000-01-02    1
      2000-01-09    1
B     2000-01-02    1
      2000-01-09    1
dtype: int64
col2
a     2000-01-02    1
      2000-01-09    1
b     2000-01-02    1
c     2000-01-09    1
dtype: int64
      2000-01-02  2000-01-09
col1
A              1           1
B              1           1
col2
a     2000-01-02    1
      2000-01-09    1
b     2000-01-02    1
c     2000-01-09    1
dtype: int64

This seems like an inconsistency to me, in the sense that, if 1 and 3 do not return the same result, neither should 2 and 4. Is this a bug, or is df.groupby(col).resample('x') supposed to behave like this, and is it supposed to behave differently to df.groubpy[(col, pd.TimeGrouper(x)])?

Note that in df col1 has two groups of identical size, while col2 has unequal sized groups. I'm not sure if that's the problem, but this inconsistency goes away for some choices of N (e.g. all four lines above return multiindex for N=3 and N=24?!)

print(df.groupby('col1').size())
print(df.groupby('col2').size())

col1
A    2
B    2
dtype: int64
col2
a    2
b    1
c    1
dtype: int64

The text was updated successfully, but these errors were encountered:

jreback · 2016-05-23T00:24:14Z

show pd.show_versions()

mikepqr · 2016-05-23T00:38:06Z

INSTALLED VERSIONS
------------------
commit: None
python: 3.5.0.final.0
python-bits: 64
OS: Darwin
OS-release: 15.4.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8

pandas: 0.18.1
nose: 1.3.7
pip: 8.1.2
setuptools: 19.6.2
Cython: 0.23.4
numpy: 1.11.0
scipy: 0.17.0
statsmodels: 0.6.1
xarray: None
IPython: 4.0.3
sphinx: 1.4.1
patsy: 0.4.1
dateutil: 2.5.3
pytz: 2016.4
blosc: None
bottleneck: None
tables: None
numexpr: 2.5
matplotlib: 1.5.1
openpyxl: None
xlrd: 0.9.4
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: 1.0.11
pymysql: None
psycopg2: None
jinja2: 2.8
boto: 2.39.0
pandas_datareader: None

jreback · 2016-05-23T12:19:36Z

this is a manifestation of #13056

I think we will have to fix this as it IS a bit inconsistent.

but will be tracked in that issue. thanks for the report (and test cases)

jreback added Groupby Resample resample method labels May 23, 2016

jreback added this to the 0.19.0 milestone May 23, 2016

edfall mentioned this issue May 23, 2016

API: inconsistent return format of groupby apply #13056

Closed

6 tasks

jreback closed this as completed May 23, 2016

jorisvandenbossche modified the milestones: No action, 0.20.0 Jul 8, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

df.groupby(col).resample('x') returning both unstacked or multiindex result #13255

df.groupby(col).resample('x') returning both unstacked or multiindex result #13255

mikepqr commented May 23, 2016

jreback commented May 23, 2016

mikepqr commented May 23, 2016 •

edited

Loading

jreback commented May 23, 2016 •

edited

Loading

df.groupby(col).resample('x') returning both unstacked or multiindex result #13255

df.groupby(col).resample('x') returning both unstacked or multiindex result #13255

Comments

mikepqr commented May 23, 2016

jreback commented May 23, 2016

mikepqr commented May 23, 2016 • edited Loading

jreback commented May 23, 2016 • edited Loading

mikepqr commented May 23, 2016 •

edited

Loading

jreback commented May 23, 2016 •

edited

Loading