as_index = False not working for applying groupby rolling agg to DataFrame #31007

hsorsky · 2020-01-14T15:51:18Z

Problem description

When using a rolling agg function on a groupby object, we cannot ommit the groupby columns from the resulting dataframe's index using as_index = False, like we can when applying a non-rolling agg function.

Code Sample

import pandas as pd

data = {
    'groupby_col': ['A', 'A', 'A', 'A', 'A', 'B', 'B', 'B', 'B', 'B', ],
    'agg_col': [1, 1, 0, 1, 0, 0, 0, 0, 1, 0],
}
df = pd.DataFrame(data)
df.groupby(['groupby_col'], as_index=False).rolling(4).agg({'agg_col': 'mean'})

Expected Output

pd.DataFrame({'agg_col': {
  0: np.nan,
  1: np.nan,
  2: np.nan,
  3: 0.75,
  4: 0.5,
  5: np.nan,
  6: np.nan,
  7: np.nan,
  8: 0.25,
  9: 0.25}})

	agg_col
0	NaN
1	NaN
2	NaN
3	0.75
4	0.50
5	NaN
6	NaN
7	NaN
8	0.25
9	0.25

Actual Output

pd.DataFrame({'agg_col': {
  (0, 0): np.nan,
  (0, 1): np.nan,
  (0, 2): np.nan,
  (0, 3): 0.75,
  (0, 4): 0.5,
  (1, 5): np.nan,
  (1, 6): np.nan,
  (1, 7): np.nan,
  (1, 8): 0.25,
  (1, 9): 0.25}})

		agg_col
0	0	NaN
	1	NaN
	2	NaN
	3	0.75
	4	0.50
1	5	NaN
	6	NaN
	7	NaN
	8	0.25
	9	0.25

Output of `pd.show_versions()`

INSTALLED VERSIONS

commit : None
python : 3.7.3.final.0
python-bits : 64
OS : Darwin
OS-release : 18.7.0
machine : x86_64
processor : i386
byteorder : little
LC_ALL : None
LANG : None
LOCALE : en_US.UTF-8

pandas : 0.25.1
numpy : 1.17.4
pytz : 2019.3
dateutil : 2.8.0
pip : 19.3.1
setuptools : 40.8.0
Cython : 0.29.14
pytest : 5.3.2
hypothesis : None
sphinx : 2.3.0
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : 1.0.1
pymysql : None
psycopg2 : 2.8.4 (dt dec pq3 ext lo64)
jinja2 : 2.10.3
IPython : 7.10.2
pandas_datareader: None
bs4 : 4.8.1
bottleneck : None
fastparquet : None
gcsfs : None
lxml.etree : None
matplotlib : 3.1.2
numexpr : 2.7.1
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : 0.15.1
pytables : None
s3fs : None
scipy : 1.3.0
sqlalchemy : 1.3.11
tables : 3.6.1
xarray : None
xlrd : None
xlwt : None
xlsxwriter : None

The text was updated successfully, but these errors were encountered:

simonjayhawkins · 2020-04-24T15:14:47Z

@hsorsky thanks for the report.

This looks similar to #32240

AFAICT, the returned index is probably correct, but the original groupby_col should not be dropped from the results with as_index=False

phofl · 2020-05-27T19:12:25Z

@simonjayhawkins This has actually a different origin. The rolling function transforms the DataFrameGroupBy into a RollingGroupBy object, which has a completly different aggregate function. The RollingGroupBy object stores the as_index flag under obj._groupby.as_index. The flag is never accessed. So the code execution is independent of this flag when using rolling with aggregate.

Edit: That was a bit misleading. The actual aggregeation is done the same way. But the process steps after the aggregation are completly different. The rolling part runs into pandas/core/window/rolling.py line 603

rhshadrach · 2023-09-06T01:36:06Z

I'm not sure what the right output here is. On the one hand, doing .groupby(...).rolling(...).agg(...) is a transformation (always has exactly one row per input row). Should we adhere to the semantics of a transformation? That would mean this operation should ignore as_index and the current index is incorrect (it should have the same index as the input DataFrame).

On other other hand, users often seem confused that as_index is ignored for transforms. I personally think we should expand on the options where the groups are in the result (or not). This is #49543.

ihsansecer · 2023-09-06T22:28:23Z

I agree. It is more sensible to ignore as_index and return index as is for rolling, ewm and expanding transformations. At least until a decision is made for #49543 I guess? But right now only agg(dict_like) and agg(list_like) behave as such. .agg(string), .agg(callable) or functions such as .mean, .sum alter index even though all are doing a transformation. #54973 is just fixing this inconsistency

rhshadrach · 2023-09-07T20:51:34Z

I agree. It is more sensible to ignore as_index and return index as is for rolling, ewm and expanding transformations.

#54973 is just fixing this inconsistency

Agreed that fixing the inconsistency is positive, but I think we want to avoid changing behavior on users as a bugfix and then changing it again as a bugfix if we can just do it all at once.

ihsansecer · 2023-09-10T12:24:43Z

Fair enough. I can make changes so that all rolling, ewm and expanding behaves like other transformations does or shall I left this to be implemented along with changes proposed in #49543?

hsorsky changed the title ~~as_index = False not working for applying groupby rolling agg to DataFrame~~ as_index = False not working for applying groupby rolling agg to DataFrame Jan 14, 2020

simonjayhawkins added Bug Groupby labels Apr 24, 2020

phofl mentioned this issue Apr 6, 2021

BUG: RollingGroupby ignored as_index=False #40789

Merged

4 tasks

phofl added the Window rolling, ewma, expanding label Apr 6, 2021

ihsansecer mentioned this issue Sep 2, 2023

BUG: Fix BaseWindowGroupby.aggregate where as_index is ignored #54973

Closed

5 tasks

rhshadrach mentioned this issue Oct 25, 2023

QST: best practices to combine groupby rolling and apply #55681

Closed

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

as_index = False not working for applying groupby rolling agg to DataFrame #31007

as_index = False not working for applying groupby rolling agg to DataFrame #31007

hsorsky commented Jan 14, 2020

INSTALLED VERSIONS

simonjayhawkins commented Apr 24, 2020

phofl commented May 27, 2020 •

edited

Loading

rhshadrach commented Sep 6, 2023

ihsansecer commented Sep 6, 2023

rhshadrach commented Sep 7, 2023

ihsansecer commented Sep 10, 2023

as_index = False not working for applying groupby rolling agg to DataFrame #31007

as_index = False not working for applying groupby rolling agg to DataFrame #31007

Comments

hsorsky commented Jan 14, 2020

Problem description

Code Sample

Expected Output

Actual Output

Output of pd.show_versions()

INSTALLED VERSIONS

simonjayhawkins commented Apr 24, 2020

phofl commented May 27, 2020 • edited Loading

rhshadrach commented Sep 6, 2023

ihsansecer commented Sep 6, 2023

rhshadrach commented Sep 7, 2023

ihsansecer commented Sep 10, 2023

Output of `pd.show_versions()`

phofl commented May 27, 2020 •

edited

Loading