Skip to content

BUG: groupby.nth() providing incorrect results in development code #49644

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
2 of 3 tasks
Dr-Irv opened this issue Nov 11, 2022 · 3 comments
Closed
2 of 3 tasks

BUG: groupby.nth() providing incorrect results in development code #49644

Dr-Irv opened this issue Nov 11, 2022 · 3 comments
Labels
Bug Needs Triage Issue that has not been reviewed by a pandas team member

Comments

@Dr-Irv
Copy link
Contributor

Dr-Irv commented Nov 11, 2022

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd
pd.__version__
df = pd.DataFrame({"x": [1, 2, 2, 3, 3], "y": [10, 20, 30, 40, 50]})
gb = df.groupby("x")["y"]
gb.nth(0)

Issue Description

With the development version, you get:

'2.0.0.dev0+643.ge41b6d7827'
0    10
1    20
3    40
Name: y, dtype: int64

With pandas 1.5.1, you get:

x
1    10
2    20
3    40
Name: y, dtype: int64

Expected Behavior

Should be the 1.5.1 behavior.

Note that the index name is missing as well as the values shown for the index don't correspond to the group values (i.e., the values of 'x'

Installed Versions

INSTALLED VERSIONS

commit : e41b6d7
python : 3.8.13.final.0
python-bits : 64
OS : Windows
OS-release : 10
Version : 10.0.19043
machine : AMD64
processor : Intel64 Family 6 Model 158 Stepping 13, GenuineIntel
byteorder : little
LC_ALL : None
LANG : None
LOCALE : English_United States.1252

pandas : 2.0.0.dev0+643.ge41b6d7827
numpy : 1.23.3
pytz : 2022.4
dateutil : 2.8.2
setuptools : 65.4.1
pip : 22.2.2
Cython : 0.29.32
pytest : 7.1.3
hypothesis : 6.56.0
sphinx : 4.5.0
blosc : None
feather : None
xlsxwriter : 3.0.3
lxml.etree : 4.9.1
html5lib : 1.1
pymysql : 1.0.2
psycopg2 : 2.9.3
jinja2 : 3.0.3
IPython : 8.5.0
pandas_datareader: 0.10.0
bs4 : 4.11.1
bottleneck : 1.3.5
brotli :
fastparquet : 0.8.3
fsspec : 2021.11.0
gcsfs : 2021.11.0
matplotlib : 3.6.0
numba : 0.56.2
numexpr : 2.8.0
odfpy : None
openpyxl : 3.0.10
pandas_gbq : 0.17.8
pyarrow : 9.0.0
pyreadstat : 1.1.9
pyxlsb : 1.0.9
s3fs : 2021.11.0
scipy : 1.9.1
snappy :
sqlalchemy : 1.4.41
tables : 3.7.0
tabulate : 0.8.10
xarray : 2022.9.0
xlrd : 2.0.1
zstandard : 0.18.0
tzdata : None
qtpy : None
pyqt5 : None

@Dr-Irv Dr-Irv added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Nov 11, 2022
@MarcoGorelli
Copy link
Member

Nice - out of interest, how did you spot this?

Am running git bisect on this https://www.kaggle.com/code/marcogorelli/pandas-regression-example?scriptVersionId=110720326

@MarcoGorelli
Copy link
Member

This changed in #49262 , and looks like it was intentional:

in particular, the result index no longer contains the groupers but rather is filtered from the original index of the input

should be OK to close then?

@MarcoGorelli
Copy link
Member

This is now the same as

In [8]: df.groupby('x')['y'].nth(0)
Out[8]: 
0    10
1    20
3    40
Name: y, dtype: int64

In [9]: df.groupby('x')['y'].head(1)
Out[9]: 
0    10
1    20
3    40
Name: y, dtype: int64

Closing for now then, but cc @rhshadrach in case this is an issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Needs Triage Issue that has not been reviewed by a pandas team member
Projects
None yet
Development

No branches or pull requests

2 participants