-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Groupby on a column with tuples creates a multiindex, yielding unexpected behaviour. #21340
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
IINM, I think we have been moving about from tuples to |
How are tuples and multiindex normally differentiated? Maybe make it explicitly multiindex? |
What I was referring to was the fact that we generally convert tuples in an index to cc @jreback |
Did this bug fixed? I came across the same situation |
This appears to be fixed on main. Test needs to be added. |
take |
take |
…ndex (#58630) * Implement test for GH #21340 * minor fixup * Lint contribution * Make spacing consistent * Lint * Remove duplicate column construction * Avoid DeprecationWarning by setting include_groups=False in apply --------- Co-authored-by: Jason Mok <[email protected]>
Code Sample, a copy-pastable example if possible
Problem description
I get this error:
ValueError: Names should be list-like for a MultiIndex
This behaviour changed somewhere between 0.19.0 and 0.19.1. This can be sidestepped by casting the tuples to a string before doing the groupby, but the behaviour is definitely unexpected.
category_string
works, butcategory_tuple
does not. Other operations also break in the same way (e.g.,describe()
). I'd expect them both to work in the same way, and not create a multiindex unless I request it.Expected Output
Output of
pd.show_versions()
INSTALLED VERSIONS
commit: None
pandas: 0.22.0
pytest: None
pip: 10.0.0
setuptools: 28.8.0
Cython: None
numpy: 1.14.2
scipy: 1.0.1
pyarrow: None
xarray: None
IPython: 6.3.1
sphinx: None
patsy: 0.5.0
dateutil: 2.6.1
pytz: 2018.4
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: 2.2.2
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: 1.0.1
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
The text was updated successfully, but these errors were encountered: