We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
I have confirmed this bug exists on the main branch of pandas.
import pandas as pd pd.__version__ df = pd.DataFrame({"x": [1, 2, 2, 3, 3], "y": [10, 20, 30, 40, 50]}) gb = df.groupby("x")["y"] gb.nth(0)
With the development version, you get:
'2.0.0.dev0+643.ge41b6d7827' 0 10 1 20 3 40 Name: y, dtype: int64
With pandas 1.5.1, you get:
x 1 10 2 20 3 40 Name: y, dtype: int64
Should be the 1.5.1 behavior.
Note that the index name is missing as well as the values shown for the index don't correspond to the group values (i.e., the values of 'x'
'x'
commit : e41b6d7 python : 3.8.13.final.0 python-bits : 64 OS : Windows OS-release : 10 Version : 10.0.19043 machine : AMD64 processor : Intel64 Family 6 Model 158 Stepping 13, GenuineIntel byteorder : little LC_ALL : None LANG : None LOCALE : English_United States.1252
pandas : 2.0.0.dev0+643.ge41b6d7827 numpy : 1.23.3 pytz : 2022.4 dateutil : 2.8.2 setuptools : 65.4.1 pip : 22.2.2 Cython : 0.29.32 pytest : 7.1.3 hypothesis : 6.56.0 sphinx : 4.5.0 blosc : None feather : None xlsxwriter : 3.0.3 lxml.etree : 4.9.1 html5lib : 1.1 pymysql : 1.0.2 psycopg2 : 2.9.3 jinja2 : 3.0.3 IPython : 8.5.0 pandas_datareader: 0.10.0 bs4 : 4.11.1 bottleneck : 1.3.5 brotli : fastparquet : 0.8.3 fsspec : 2021.11.0 gcsfs : 2021.11.0 matplotlib : 3.6.0 numba : 0.56.2 numexpr : 2.8.0 odfpy : None openpyxl : 3.0.10 pandas_gbq : 0.17.8 pyarrow : 9.0.0 pyreadstat : 1.1.9 pyxlsb : 1.0.9 s3fs : 2021.11.0 scipy : 1.9.1 snappy : sqlalchemy : 1.4.41 tables : 3.7.0 tabulate : 0.8.10 xarray : 2022.9.0 xlrd : 2.0.1 zstandard : 0.18.0 tzdata : None qtpy : None pyqt5 : None
The text was updated successfully, but these errors were encountered:
Nice - out of interest, how did you spot this?
Am running git bisect on this https://www.kaggle.com/code/marcogorelli/pandas-regression-example?scriptVersionId=110720326
Sorry, something went wrong.
This changed in #49262 , and looks like it was intentional:
in particular, the result index no longer contains the groupers but rather is filtered from the original index of the input
should be OK to close then?
This is now the same as
In [8]: df.groupby('x')['y'].nth(0) Out[8]: 0 10 1 20 3 40 Name: y, dtype: int64 In [9]: df.groupby('x')['y'].head(1) Out[9]: 0 10 1 20 3 40 Name: y, dtype: int64
Closing for now then, but cc @rhshadrach in case this is an issue
No branches or pull requests
Pandas version checks
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
Issue Description
With the development version, you get:
With pandas 1.5.1, you get:
Expected Behavior
Should be the 1.5.1 behavior.
Note that the index name is missing as well as the values shown for the index don't correspond to the group values (i.e., the values of
'x'
Installed Versions
INSTALLED VERSIONS
commit : e41b6d7
python : 3.8.13.final.0
python-bits : 64
OS : Windows
OS-release : 10
Version : 10.0.19043
machine : AMD64
processor : Intel64 Family 6 Model 158 Stepping 13, GenuineIntel
byteorder : little
LC_ALL : None
LANG : None
LOCALE : English_United States.1252
pandas : 2.0.0.dev0+643.ge41b6d7827
numpy : 1.23.3
pytz : 2022.4
dateutil : 2.8.2
setuptools : 65.4.1
pip : 22.2.2
Cython : 0.29.32
pytest : 7.1.3
hypothesis : 6.56.0
sphinx : 4.5.0
blosc : None
feather : None
xlsxwriter : 3.0.3
lxml.etree : 4.9.1
html5lib : 1.1
pymysql : 1.0.2
psycopg2 : 2.9.3
jinja2 : 3.0.3
IPython : 8.5.0
pandas_datareader: 0.10.0
bs4 : 4.11.1
bottleneck : 1.3.5
brotli :
fastparquet : 0.8.3
fsspec : 2021.11.0
gcsfs : 2021.11.0
matplotlib : 3.6.0
numba : 0.56.2
numexpr : 2.8.0
odfpy : None
openpyxl : 3.0.10
pandas_gbq : 0.17.8
pyarrow : 9.0.0
pyreadstat : 1.1.9
pyxlsb : 1.0.9
s3fs : 2021.11.0
scipy : 1.9.1
snappy :
sqlalchemy : 1.4.41
tables : 3.7.0
tabulate : 0.8.10
xarray : 2022.9.0
xlrd : 2.0.1
zstandard : 0.18.0
tzdata : None
qtpy : None
pyqt5 : None
The text was updated successfully, but these errors were encountered: