BUG: DatetimeIndexResamplerGroupby as_index interaction with .agg() #52397

1112114641 · 2023-04-04T04:16:01Z

Pandas version checks

I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

# works
pd.DataFrame(
    {"a":np.repeat([0,1,2,3,4], 10),"b":range(50,100)}, 
    index=pd.date_range(start=pd.Timestamp.now(),freq="1min",periods=50)
).groupby("a",as_index=False).resample("2min").min()

# works
pd.DataFrame(
    {"a":np.repeat([0,1,2,3,4], 10),"b":range(50,100)}, 
    index=pd.date_range(start=pd.Timestamp.now(),freq="1min",periods=50)
).groupby("a",as_index=False).resample("2min").agg(min)

# works
pd.DataFrame(
    {"a":np.repeat([0,1,2,3,4], 10),"b":range(50,100)}, 
    index=pd.date_range(start=pd.Timestamp.now(),freq="1min",periods=50)
).groupby("a",as_index=True).resample("2min").agg({"b":"min"})

# does not work
pd.DataFrame(
    {"a":np.repeat([0,1,2,3,4], 10),"b":range(50,100)}, 
    index=pd.date_range(start=pd.Timestamp.now(),freq="1min",periods=50)
).groupby("a",as_index=False).resample("2min").agg({"b":"min"})

Issue Description

Using pandas 2.0.0 / python 3.9.13 running the above code yields different behaviour based on whether the as_index=False flag is set in the .groupby() method or not.

The resulting error message ValueError: Length of values (5) does not match length of index (30) is misleading, uninformative, and does not explain the difference in behaviour between .agg({"b":"min"}), and .agg(min). The same holds true for other standard .groupby aggregations (max, first, ...).

Could you please look into this?

Expected Behavior

.agg({"b":"min"}) and .agg(min) should yield the same result, irrespective of as_index=True/False.

Installed Versions

INSTALLED VERSIONS

commit : 478d340
python : 3.9.13.final.0
python-bits : 64
OS : Darwin
OS-release : 22.4.0
Version : Darwin Kernel Version 22.4.0: Mon Mar 6 20:59:28 PST 2023; root:xnu-8796.101.5~3/RELEASE_ARM64_T6000
machine : arm64
processor : arm
byteorder : little
LC_ALL : None
LANG : None
LOCALE : None.UTF-8

pandas : 2.0.0
numpy : 1.23.5
pytz : 2022.7.1
dateutil : 2.8.2
setuptools : 63.3.0
pip : 23.0.1
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 4.9.2
html5lib : 1.1
pymysql : None
psycopg2 : None
jinja2 : 3.1.2
IPython : 8.9.0
pandas_datareader: None
bs4 : 4.11.2
bottleneck : None
brotli : None
fastparquet : None
fsspec : None
gcsfs : None
matplotlib : 3.6.3
numba : 0.56.4
numexpr : None
odfpy : None
openpyxl : 3.1.1
pandas_gbq : None
pyarrow : 11.0.0
pyreadstat : None
pyxlsb : None
s3fs : None
scipy : 1.10.0
snappy : None
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
zstandard : None
tzdata : 2023.3
qtpy : None
pyqt5 : None

The text was updated successfully, but these errors were encountered:

szczekulskij · 2023-04-08T10:25:16Z

Can I pick this up and get two weeks to dig into the codebase? Thanks

luke396 · 2023-04-09T05:31:33Z

@szczekulskij Just wanted to kindly remind you that commenting on 'take' will automatically assign you to this issue. No pressure though!

1112114641 · 2023-04-13T00:15:38Z

as an interesting aside, I discovered a new method of defining aggregations, and this code works also:

pd.DataFrame(
    {"a":np.repeat([0,1,2,3,4], 10),"b":range(50,100)}, 
    index=pd.date_range(start=pd.Timestamp.now(),freq="1min",periods=50)
).groupby("a",as_index=False).resample("2min").agg(b=("b","min"))

szczekulskij · 2023-04-15T02:13:05Z

take

szczekulskij · 2023-04-18T23:43:25Z

To keep this thread updated - I've identified the problem is in pandas/core/groupby/generic.py.
Line 421: if not self.as_index and not_indexed_same: - if this line doesn't result in True, everything works as expected

I'll now work on solving this

szczekulskij · 2023-04-21T01:08:47Z

Hey, could I ask for review on the attached PR ?
It definitely solves the issue, but I'm open for feedback - I'm still learning more about the repo

rhshadrach · 2023-04-21T03:27:31Z

The last example in the OP doesn't raise on 1.5.x, marking as a regression.

rhshadrach · 2023-04-23T12:50:13Z

It's not clear to me what the expected behavior of using as_index=False should be. I'm not finding any tests with groupby(...).resample(...) and as_index=False. In the docs, we say as_index has no impact on filtrations or transformations, but a resample doesn't fall into these categories.

cc @jbrockmendel @mroeschke @topper-123

rhshadrach · 2023-04-23T12:55:10Z

A closer look at behavior, it appears to me as_index has no impact to resampler across methods. I think we stick with that for this issue, if we want to change that then we can take it up in a dedicated issue.

szczekulskij · 2023-05-07T16:38:26Z

Hey @rhshadrach, sorry for delay - updated now.
Happy to create a seperate ticket to implement a different functionality dependent on whether as_index=False (if I understood correctely, this is what you've discussed in the previous point)

Thanks for feedback and going w. me through this ticket

1112114641 added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Apr 4, 2023

DeaMariaLeon added Groupby Apply Apply, Aggregate, Transform, Map Timestamp pd.Timestamp and associated methods and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Apr 4, 2023

github-actions bot assigned szczekulskij Apr 15, 2023

szczekulskij mentioned this issue Apr 21, 2023

BUG: DatetimeIndexResamplerGroupby as_index interaction with .agg() #52819

Closed

5 tasks

rhshadrach added Regression Functionality that used to work in a prior pandas version Apply Apply, Aggregate, Transform, Map and removed Apply Apply, Aggregate, Transform, Map labels Apr 21, 2023

rhshadrach added this to the 2.0.1 milestone Apr 21, 2023

datapythonista modified the milestones: 2.0.1, 2.0.2 Apr 23, 2023

szczekulskij mentioned this issue Apr 24, 2023

BUG: DataFrameGroupBy.nunique fails with categorical, multiple groupings, and as_index=False #52848

Closed

datapythonista modified the milestones: 2.0.2, 2.0.3 May 26, 2023

lithomas1 modified the milestones: 2.0.3, 2.0.4 Jun 27, 2023

lithomas1 removed this from the 2.0.4 milestone Aug 30, 2023

lithomas1 added this to the 2.1.1 milestone Aug 30, 2023

lithomas1 modified the milestones: 2.1.1, 2.1.2 Sep 21, 2023

lithomas1 modified the milestones: 2.1.2, 2.1.3 Oct 26, 2023

jorisvandenbossche modified the milestones: 2.1.3, 2.1.4 Nov 13, 2023

lithomas1 modified the milestones: 2.1.4, 2.2 Dec 8, 2023

lithomas1 modified the milestones: 2.2, 2.2.1 Jan 20, 2024

lithomas1 modified the milestones: 2.2.1, 2.2.2 Feb 23, 2024

lithomas1 modified the milestones: 2.2.2, 2.2.3 Apr 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: DatetimeIndexResamplerGroupby as_index interaction with .agg() #52397

BUG: DatetimeIndexResamplerGroupby as_index interaction with .agg() #52397

1112114641 commented Apr 4, 2023 •

edited

Loading

INSTALLED VERSIONS

szczekulskij commented Apr 8, 2023

luke396 commented Apr 9, 2023

1112114641 commented Apr 13, 2023

szczekulskij commented Apr 15, 2023

szczekulskij commented Apr 18, 2023 •

edited

Loading

szczekulskij commented Apr 21, 2023

rhshadrach commented Apr 21, 2023

rhshadrach commented Apr 23, 2023 •

edited

Loading

rhshadrach commented Apr 23, 2023

szczekulskij commented May 7, 2023

BUG: DatetimeIndexResamplerGroupby as_index interaction with .agg() #52397

BUG: DatetimeIndexResamplerGroupby as_index interaction with .agg() #52397

Comments

1112114641 commented Apr 4, 2023 • edited Loading

Pandas version checks

Reproducible Example

Issue Description

Expected Behavior

Installed Versions

INSTALLED VERSIONS

szczekulskij commented Apr 8, 2023

luke396 commented Apr 9, 2023

1112114641 commented Apr 13, 2023

szczekulskij commented Apr 15, 2023

szczekulskij commented Apr 18, 2023 • edited Loading

szczekulskij commented Apr 21, 2023

rhshadrach commented Apr 21, 2023

rhshadrach commented Apr 23, 2023 • edited Loading

rhshadrach commented Apr 23, 2023

szczekulskij commented May 7, 2023

1112114641 commented Apr 4, 2023 •

edited

Loading

szczekulskij commented Apr 18, 2023 •

edited

Loading

rhshadrach commented Apr 23, 2023 •

edited

Loading