-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: DataFrame.describe() breaks with a column index of object type and numeric entries #13288
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
simple enough to stringify column names. |
https://github.com/pydata/pandas/blob/master/pandas/core/generic.py#L4957 should be
Separately, this violates our guarantees on Index creation. I think we should assert that the dtype of a create
|
Not sure if I understand. Don't we want |
The trouble is the columns are split up by dtype, so the sub-indexes need to be constructed similarly. actually you don't even need to specify the dtype, I think ``d.columns = d.columns.copy() |
Oh, you mean d.columns = data.columns.copy() #(1) and earlier d.columns = Index(data.columns, dtype='object'). #(2) (1) does work. Or at least it passes tests from the repository plus some others I tried. (Actually, I ran nosetests with On the other hand, (2) fails with My understanding is that since Another advantage of (1) is that we can skip the next line
|
yeah (2) is not what we want, we don't need to coerce. so use (1). This was using some internal code which it shouldn't have. (Index has a public API and we try to use whenever possible, except when deeply needed). The reason is that there are certain guarantees, which in this case were violated (the separate issue I opened). |
@jreback Thanks for clarifying. |
…s-dev#13288) BUG pandas-dev#13104: - Percentile identifiers are now rounded to the least precision that keeps them unique. - Supplying duplicates in percentiles will raise ValueError. BUG pandas-dev#13288 - Fixed a column index of the output data frame. Previously, if a data frame had a column index of object type and the index contained numeric values, the output column index could be corrupt. It led to ValueError if the output was displayed. - describe() will raise ValueError with an informative message on DataFrame without columns.
Preparing a commit for another issue in
.describe()
, I encountered this puzzling bug, surprisingly easy to trigger.Symptoms
However:
Same issue happens with a simpler data frame:
Current version (but the bug is also present in pandas release 0.18.1):
Reason
Some internal function gets confused by dtypes of a column index, I guess. But the faulty index is created in
.describe()
._shallow_copy()
in the marked line changesd.columns
:Possible solutions
Lines 4957-4958 are actually used to fix issues that
pd.concat
brings about. They try to pass the column structure fromself
tod
.I think a simpler solution is replacing these lines with:
or
data
is a subframe ofself
and retains the same column structure.pd.concat
has some parameters that help pass a hierarchical index but can't do anything on its own with a categorical one.I'm going to submit a pull request with this fix together with some others related with
describe()
. I hope I haven't overlooked anything obvious. But if so, any comments are very welcome.The text was updated successfully, but these errors were encountered: