Skip to content

BUG: Pandas groupby indices behaving diferrently with 2 and 3 rows #18451

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
mcdallas opened this issue Nov 23, 2017 · 3 comments · Fixed by #35494
Closed

BUG: Pandas groupby indices behaving diferrently with 2 and 3 rows #18451

mcdallas opened this issue Nov 23, 2017 · 3 comments · Fixed by #35494
Assignees
Labels
Groupby Needs Tests Unit test(s) needed to prevent regressions
Milestone

Comments

@mcdallas
Copy link

Code Sample, a copy-pastable example if possible

df1 = pd.DataFrame({
    'Company': ['Foo Inc.', 'Foo Inc.', 'Foo Inc.'],
    'ID': ['123456', '123456', '123456'],
    'Employee': ['John Doe', 'Richard Roe', 'Jane Doe'],
    'Position': ['Executive Director', 'Director', 'Company Secretary']
})
    
df2 = pd.DataFrame({
    'Company': ['Bar Inc.', 'Bar Inc.'],
    'ID': ['56789', '56789'],
    'Employee': ['Mark Moe', 'Larry Loe'],
    'Position': ['Tax Consultant', 'Company Secretary']
})

print(df1)
    Company     Employee      ID            Position
0  Foo Inc.     John Doe  123456  Executive Director
1  Foo Inc.  Richard Roe  123456            Director
2  Foo Inc.     Jane Doe  123456   Company Secretary

print(df2)
    Company   Employee     ID           Position
0  Bar Inc.   Mark Moe  56789     Tax Consultant
1  Bar Inc.  Larry Loe  56789  Company Secretary

gb1 = df1.set_index(['Company', 'ID', 'Employee']).groupby(['Company', 'ID'])
gb2 = df2.set_index(['Company', 'ID', 'Employee']).groupby(['Company', 'ID'])
    
for (name, id), new_df in gb1:
    print(name)
    print(id)
    
for (name, id), new_df in gb2:
    print(name)
    print(id)

Foo Inc.
123456

      3     print(id)
      4
----> 5 for (name, id), new_df in gb2:
      6     print(name)
      7     print(id)

ValueError: too many values to unpack (expected 2)

Problem description

I have 2 dataframes df1 and df2. Their format is the same with the only difference that the first has 3 rows and the second 2.

When I try to groupby and run the loop above it works for the first but not for the second.
This is because their indices are different

gb1.indices
>>> {('Foo Inc.', '123456'): array([0, 1, 2], dtype=int64)}

gb2.indices
>>> {'Company': array([0], dtype=int64), 'ID': array([1], dtype=int64)}

the code above works if I replace the groupby line with

gb2 = df2.set_index(['Company', 'ID', 'Employee']).groupby(level=['Company', 'ID'])

Expected Output

The output should be consistent in both cases.

Output of pd.show_versions()

pandas: 0.20.1
pytest: 3.2.3
pip: 9.0.1
setuptools: 36.7.2
Cython: 0.25.2
numpy: 1.12.1
scipy: 0.19.0
xarray: None
IPython: 5.3.0
sphinx: 1.5.6
patsy: 0.4.1
dateutil: 2.6.0
pytz: 2017.2
blosc: None
bottleneck: 1.2.1
tables: 3.2.2
numexpr: 2.6.2
feather: None
matplotlib: 2.0.2
openpyxl: 2.4.7
xlrd: 1.0.0
xlwt: 1.2.0
xlsxwriter: 0.9.6
lxml: 3.7.3
bs4: 4.6.0
html5lib: 0.9999999
sqlalchemy: 1.1.9
pymysql: None
psycopg2: None
jinja2: 2.9.6
s3fs: None
pandas_gbq: None
pandas_datareader: None

@smithto1
Copy link
Member

This is fixed as of 0.25.3, and works in 1.1.0. Not sure what the relevant fix was but I think this issue can be marked as closed, @jreback .

@jreback
Copy link
Contributor

jreback commented Jul 30, 2020

thanks @smithto1 would you see if we have sufficient tests and if not pls do a PR with the OP as tests

smithto1 added a commit to smithto1/pandas that referenced this issue Jul 31, 2020
@simonjayhawkins simonjayhawkins added the Needs Tests Unit test(s) needed to prevent regressions label Jul 31, 2020
@simonjayhawkins simonjayhawkins added this to the 1.2 milestone Jul 31, 2020
@smithto1
Copy link
Member

smithto1 commented Aug 1, 2020

take

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Groupby Needs Tests Unit test(s) needed to prevent regressions
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants