-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: DatetimeIndexResamplerGroupby as_index interaction with .agg() #52397
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Can I pick this up and get two weeks to dig into the codebase? Thanks |
@szczekulskij Just wanted to kindly remind you that commenting on 'take' will automatically assign you to this issue. No pressure though! |
as an interesting aside, I discovered a new method of defining aggregations, and this code works also: pd.DataFrame(
{"a":np.repeat([0,1,2,3,4], 10),"b":range(50,100)},
index=pd.date_range(start=pd.Timestamp.now(),freq="1min",periods=50)
).groupby("a",as_index=False).resample("2min").agg(b=("b","min")) |
take |
To keep this thread updated - I've identified the problem is in pandas/core/groupby/generic.py. I'll now work on solving this |
Hey, could I ask for review on the attached PR ? |
The last example in the OP doesn't raise on 1.5.x, marking as a regression. |
It's not clear to me what the expected behavior of using |
A closer look at behavior, it appears to me |
Hey @rhshadrach, sorry for delay - updated now. Thanks for feedback and going w. me through this ticket |
Pandas version checks
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
Issue Description
Using pandas 2.0.0 / python 3.9.13 running the above code yields different behaviour based on whether the
as_index=False
flag is set in the.groupby()
method or not.The resulting error message
ValueError: Length of values (5) does not match length of index (30)
is misleading, uninformative, and does not explain the difference in behaviour between.agg({"b":"min"})
, and.agg(min)
. The same holds true for other standard.groupby
aggregations (max, first, ...).Could you please look into this?
Expected Behavior
.agg({"b":"min"})
and.agg(min)
should yield the same result, irrespective ofas_index=True/False
.Installed Versions
INSTALLED VERSIONS
commit : 478d340
python : 3.9.13.final.0
python-bits : 64
OS : Darwin
OS-release : 22.4.0
Version : Darwin Kernel Version 22.4.0: Mon Mar 6 20:59:28 PST 2023; root:xnu-8796.101.5~3/RELEASE_ARM64_T6000
machine : arm64
processor : arm
byteorder : little
LC_ALL : None
LANG : None
LOCALE : None.UTF-8
pandas : 2.0.0
numpy : 1.23.5
pytz : 2022.7.1
dateutil : 2.8.2
setuptools : 63.3.0
pip : 23.0.1
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 4.9.2
html5lib : 1.1
pymysql : None
psycopg2 : None
jinja2 : 3.1.2
IPython : 8.9.0
pandas_datareader: None
bs4 : 4.11.2
bottleneck : None
brotli : None
fastparquet : None
fsspec : None
gcsfs : None
matplotlib : 3.6.3
numba : 0.56.4
numexpr : None
odfpy : None
openpyxl : 3.1.1
pandas_gbq : None
pyarrow : 11.0.0
pyreadstat : None
pyxlsb : None
s3fs : None
scipy : 1.10.0
snappy : None
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
zstandard : None
tzdata : 2023.3
qtpy : None
pyqt5 : None
The text was updated successfully, but these errors were encountered: