Skip to content

as_index = False not working for applying groupby rolling agg to DataFrame #31007

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
hsorsky opened this issue Jan 14, 2020 · 6 comments
Open
Labels
Bug Groupby Window rolling, ewma, expanding

Comments

@hsorsky
Copy link

hsorsky commented Jan 14, 2020

Problem description

When using a rolling agg function on a groupby object, we cannot ommit the groupby columns from the resulting dataframe's index using as_index = False, like we can when applying a non-rolling agg function.

Code Sample

import pandas as pd

data = {
    'groupby_col': ['A', 'A', 'A', 'A', 'A', 'B', 'B', 'B', 'B', 'B', ],
    'agg_col': [1, 1, 0, 1, 0, 0, 0, 0, 1, 0],
}
df = pd.DataFrame(data)
df.groupby(['groupby_col'], as_index=False).rolling(4).agg({'agg_col': 'mean'})

Expected Output

pd.DataFrame({'agg_col': {
  0: np.nan,
  1: np.nan,
  2: np.nan,
  3: 0.75,
  4: 0.5,
  5: np.nan,
  6: np.nan,
  7: np.nan,
  8: 0.25,
  9: 0.25}})
agg_col
0 NaN
1 NaN
2 NaN
3 0.75
4 0.50
5 NaN
6 NaN
7 NaN
8 0.25
9 0.25

Actual Output

pd.DataFrame({'agg_col': {
  (0, 0): np.nan,
  (0, 1): np.nan,
  (0, 2): np.nan,
  (0, 3): 0.75,
  (0, 4): 0.5,
  (1, 5): np.nan,
  (1, 6): np.nan,
  (1, 7): np.nan,
  (1, 8): 0.25,
  (1, 9): 0.25}})
agg_col
0 0 NaN
1 NaN
2 NaN
3 0.75
4 0.50
1 5 NaN
6 NaN
7 NaN
8 0.25
9 0.25

Output of pd.show_versions()

INSTALLED VERSIONS

commit : None
python : 3.7.3.final.0
python-bits : 64
OS : Darwin
OS-release : 18.7.0
machine : x86_64
processor : i386
byteorder : little
LC_ALL : None
LANG : None
LOCALE : en_US.UTF-8

pandas : 0.25.1
numpy : 1.17.4
pytz : 2019.3
dateutil : 2.8.0
pip : 19.3.1
setuptools : 40.8.0
Cython : 0.29.14
pytest : 5.3.2
hypothesis : None
sphinx : 2.3.0
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : 1.0.1
pymysql : None
psycopg2 : 2.8.4 (dt dec pq3 ext lo64)
jinja2 : 2.10.3
IPython : 7.10.2
pandas_datareader: None
bs4 : 4.8.1
bottleneck : None
fastparquet : None
gcsfs : None
lxml.etree : None
matplotlib : 3.1.2
numexpr : 2.7.1
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : 0.15.1
pytables : None
s3fs : None
scipy : 1.3.0
sqlalchemy : 1.3.11
tables : 3.6.1
xarray : None
xlrd : None
xlwt : None
xlsxwriter : None

@hsorsky hsorsky changed the title as_index = False not working for applying groupby rolling agg to DataFrame as_index = False not working for applying groupby rolling agg to DataFrame Jan 14, 2020
@simonjayhawkins
Copy link
Member

@hsorsky thanks for the report.

This looks similar to #32240

AFAICT, the returned index is probably correct, but the original groupby_col should not be dropped from the results with as_index=False

@phofl
Copy link
Member

phofl commented May 27, 2020

@simonjayhawkins This has actually a different origin. The rolling function transforms the DataFrameGroupBy into a RollingGroupBy object, which has a completly different aggregate function. The RollingGroupBy object stores the as_index flag under obj._groupby.as_index. The flag is never accessed. So the code execution is independent of this flag when using rolling with aggregate.

Edit: That was a bit misleading. The actual aggregeation is done the same way. But the process steps after the aggregation are completly different. The rolling part runs into pandas/core/window/rolling.py line 603

@rhshadrach
Copy link
Member

I'm not sure what the right output here is. On the one hand, doing .groupby(...).rolling(...).agg(...) is a transformation (always has exactly one row per input row). Should we adhere to the semantics of a transformation? That would mean this operation should ignore as_index and the current index is incorrect (it should have the same index as the input DataFrame).

On other other hand, users often seem confused that as_index is ignored for transforms. I personally think we should expand on the options where the groups are in the result (or not). This is #49543.

@ihsansecer
Copy link
Contributor

I agree. It is more sensible to ignore as_index and return index as is for rolling, ewm and expanding transformations. At least until a decision is made for #49543 I guess? But right now only agg(dict_like) and agg(list_like) behave as such. .agg(string), .agg(callable) or functions such as .mean, .sum alter index even though all are doing a transformation. #54973 is just fixing this inconsistency

@rhshadrach
Copy link
Member

I agree. It is more sensible to ignore as_index and return index as is for rolling, ewm and expanding transformations.

#54973 is just fixing this inconsistency

Agreed that fixing the inconsistency is positive, but I think we want to avoid changing behavior on users as a bugfix and then changing it again as a bugfix if we can just do it all at once.

@ihsansecer
Copy link
Contributor

Fair enough. I can make changes so that all rolling, ewm and expanding behaves like other transformations does or shall I left this to be implemented along with changes proposed in #49543?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Groupby Window rolling, ewma, expanding
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants