-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: DataFrame.groupby() on tuple column works only when column name is "key" #14848
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Tuples in columns in not that well supported/tested, but this looks indeed like a bug. Welcome to look into it! |
It was working perfect in 0.19.0. Updating to 0.19.1 broke my code and this turned out to be the cause. I usually don't work with tuples in columns, but my current project called for it. I had the choice of using three columns to store the three values in the tuple, but since tuples are immutable, I chose to use them. I have workarounds, so it is not high priority for me. |
There's a correlation between the length of the column name and the number of items in the tuples. So |
… as the Index closes pandas-dev#14848 Author: Dr-Irv <[email protected]> Closes pandas-dev#15110 from Dr-Irv/Issue14848 and squashes the following commits: c18c6cb [Dr-Irv] Undo change to merge.py and make whatsnew a 2 line comment. db13c3b [Dr-Irv] Use not is_list_like fbd20f5 [Dr-Irv] Raise error when creating index of tuples with name parameter a string f3a7a21 [Dr-Irv] Changes per jreback requests 9489cb2 [Dr-Irv] BUG: Fix issue pandas-dev#14848 groupby().describe() on indices containing all tuples
This is the weirdest bug I have seen in Pandas. But I am guessing (hoping) the fix will not be too difficult.
Code Sample
Consider the following two code blocks:
Block 1: key column is called "k"
Block 2: key column is called "key"
Note that the same, static data is used, so that nothing else may be different, and hence culpable.
Problem description
Running a simple
.groupby().describe()
operation produces the following results:Note that
groupby().mean()
,sum()
, and a few others work fine.describe()
is the only one I think is causing the problem.Expected Output
Obviously, the expected output for
df1.groupby('k').describe()
should be the same asdf2.groupby('key').describe()
.Output of
pd.show_versions()
pandas: 0.19.1
nose: None
pip: 9.0.1
setuptools: 27.2.0
Cython: 0.25.1
numpy: 1.11.2
scipy: 0.18.1
statsmodels: 0.6.1
xarray: 0.8.2
IPython: 5.1.0
sphinx: None
patsy: 0.4.1
dateutil: 2.6.0
pytz: 2016.7
blosc: None
bottleneck: 1.1.0
tables: 3.3.0
numexpr: 2.6.1
matplotlib: 1.5.3
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: 0.9.4
lxml: None
bs4: None
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.8
boto: None
pandas_datareader: None
The text was updated successfully, but these errors were encountered: