We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
df1 = pd.DataFrame({ 'Company': ['Foo Inc.', 'Foo Inc.', 'Foo Inc.'], 'ID': ['123456', '123456', '123456'], 'Employee': ['John Doe', 'Richard Roe', 'Jane Doe'], 'Position': ['Executive Director', 'Director', 'Company Secretary'] }) df2 = pd.DataFrame({ 'Company': ['Bar Inc.', 'Bar Inc.'], 'ID': ['56789', '56789'], 'Employee': ['Mark Moe', 'Larry Loe'], 'Position': ['Tax Consultant', 'Company Secretary'] }) print(df1) Company Employee ID Position 0 Foo Inc. John Doe 123456 Executive Director 1 Foo Inc. Richard Roe 123456 Director 2 Foo Inc. Jane Doe 123456 Company Secretary print(df2) Company Employee ID Position 0 Bar Inc. Mark Moe 56789 Tax Consultant 1 Bar Inc. Larry Loe 56789 Company Secretary gb1 = df1.set_index(['Company', 'ID', 'Employee']).groupby(['Company', 'ID']) gb2 = df2.set_index(['Company', 'ID', 'Employee']).groupby(['Company', 'ID']) for (name, id), new_df in gb1: print(name) print(id) for (name, id), new_df in gb2: print(name) print(id) Foo Inc. 123456 3 print(id) 4 ----> 5 for (name, id), new_df in gb2: 6 print(name) 7 print(id) ValueError: too many values to unpack (expected 2)
I have 2 dataframes df1 and df2. Their format is the same with the only difference that the first has 3 rows and the second 2.
When I try to groupby and run the loop above it works for the first but not for the second. This is because their indices are different
gb1.indices >>> {('Foo Inc.', '123456'): array([0, 1, 2], dtype=int64)} gb2.indices >>> {'Company': array([0], dtype=int64), 'ID': array([1], dtype=int64)}
the code above works if I replace the groupby line with
gb2 = df2.set_index(['Company', 'ID', 'Employee']).groupby(level=['Company', 'ID'])
The output should be consistent in both cases.
pd.show_versions()
pandas: 0.20.1 pytest: 3.2.3 pip: 9.0.1 setuptools: 36.7.2 Cython: 0.25.2 numpy: 1.12.1 scipy: 0.19.0 xarray: None IPython: 5.3.0 sphinx: 1.5.6 patsy: 0.4.1 dateutil: 2.6.0 pytz: 2017.2 blosc: None bottleneck: 1.2.1 tables: 3.2.2 numexpr: 2.6.2 feather: None matplotlib: 2.0.2 openpyxl: 2.4.7 xlrd: 1.0.0 xlwt: 1.2.0 xlsxwriter: 0.9.6 lxml: 3.7.3 bs4: 4.6.0 html5lib: 0.9999999 sqlalchemy: 1.1.9 pymysql: None psycopg2: None jinja2: 2.9.6 s3fs: None pandas_gbq: None pandas_datareader: None
The text was updated successfully, but these errors were encountered:
This is fixed as of 0.25.3, and works in 1.1.0. Not sure what the relevant fix was but I think this issue can be marked as closed, @jreback .
Sorry, something went wrong.
thanks @smithto1 would you see if we have sufficient tests and if not pls do a PR with the OP as tests
adding test for pandas-dev#18451
5416aa3
take
adding test for #18451 (#35494)
87c1761
smithto1
Successfully merging a pull request may close this issue.
Code Sample, a copy-pastable example if possible
Problem description
I have 2 dataframes df1 and df2. Their format is the same with the only difference that the first has 3 rows and the second 2.
When I try to groupby and run the loop above it works for the first but not for the second.
This is because their indices are different
the code above works if I replace the groupby line with
Expected Output
The output should be consistent in both cases.
Output of
pd.show_versions()
pandas: 0.20.1
pytest: 3.2.3
pip: 9.0.1
setuptools: 36.7.2
Cython: 0.25.2
numpy: 1.12.1
scipy: 0.19.0
xarray: None
IPython: 5.3.0
sphinx: 1.5.6
patsy: 0.4.1
dateutil: 2.6.0
pytz: 2017.2
blosc: None
bottleneck: 1.2.1
tables: 3.2.2
numexpr: 2.6.2
feather: None
matplotlib: 2.0.2
openpyxl: 2.4.7
xlrd: 1.0.0
xlwt: 1.2.0
xlsxwriter: 0.9.6
lxml: 3.7.3
bs4: 4.6.0
html5lib: 0.9999999
sqlalchemy: 1.1.9
pymysql: None
psycopg2: None
jinja2: 2.9.6
s3fs: None
pandas_gbq: None
pandas_datareader: None
The text was updated successfully, but these errors were encountered: