-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: sort_index/sortlevel fails MultiIndex after columns are added. #8017
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
just make the example ONLY the muli_float and just run it on one. simplier/shorter is much better |
Sure. I'll make the change above. I had thought it was important to show that it worked fine with a float index that was not a multiindex. But no problem. (Done.) |
@8one6 thanks, your title says it all though |
@jreback Just one note on your title update. The problem is still there if all of the |
yeh I think has to do with the adding, will look |
I think the fix for this will be much deeper in the bowels of Pandas indices than I'm able to handle. Is there any other way I could help toward a patch for this bug? |
np. would appreciate a pull-request on any other issue. thanks! |
In an attempt to get around this issue, I started sticking a character at the end of some of my column names (to turn them from numbers into strings). While the
which looks fine. and sorts fine:
But when I add a new column and try to sort, this still goes wrong:
(I.e. I think that after the |
@8one6 pls take a look at #8282 this was actually a very strange bug. In essence, when you add the column it is inserted in the multi-index at the end. This makes the index no longer lexsorted itself (in fact goes from 2->1 for the lexsort_depth). And in fact, the only way to actually then lexsort it is to reconstrut it in its entirety. I believe this was designed this way to avoid having to do a complete refactorization anytime anything is inserted into a multi-index. Secondarily, their was a display bug when using FloatIndexes e.g. setup
master
this PR
|
First off, thank you so much for your time on this. I'll try to give this a test ASAP. Do you think your PR also addresses the version of this issue that I highlighted in the post I put up yesterday (the one immediately before your last post)? |
@8one6 yes its the same issue (the printing issue is only with a Float64Index among the levels). |
@jreback Are you sure it is only with a FloatIndex? If I change it to integers in the example above, I have the exact same behaviour |
@jorisvandenbossche you are talking about the printing or sorting issue? |
@jreback both With int columns:
|
@jorisvandenbossche this is fixed/tested with all dtypes |
I think there is still some lingering issue here. I still need to get a MWE up and running, but in the mean time, here is a screenshot showing the issue. I would say that I.e. the thing to notice here is that the first columns in the first two display cells have Just to make sure I'm not going nuts here, can you guys confirm that this looks like a bug and that its worth the effort to put together a MWE to demonstrate from scratch? |
That seems like a possible bug, as this sorts differently/correctly without a multi-index. Can you try to show a small reproducible example showing the issue? |
I'll have a shot at it. It's odd because I have two DataFrames whose generation is very similar but which don't exhibit parallel behavior in this case. I.e. DF1 comes out of its process sorting just fine, but DF2 (which is the one up above) comes out sorted incorrectly, even though they have very similar structures. One key difference is that DF1 has many more columns than DF2. Not sure if that could be related. Either way, I'll have a look. |
I have a
DataFrame
with aMultiIndex
on the columns. The first level of the MultiIndex containsstr
ings. The second,float
s (though the problem persists if the second level isint
s). I add a column to theDataFrame
(which should not come last if the columns are sorted). I try to sort theDataFrame
. The result does not seem to be sorted. The behavior is fine if the columns are simply anIndex
(even after adding columns). And the sort works fine in theMultiIndex
case as long as no columns have been added since theDataFrame
was created.MWE:
This sorts just fine as it isnow:
But if I add columns to both this `DataFrame and then show it sorted, I get what looks to be a wrong result (the new column remains last, rather than being placed second-to-last as it should be):
I'm able to produce this behavior on two systems. The first runs Pandas 0.14.0 and Numpy 1.8.1 and the second runs Pandas 0.14.1 and Numpy 1.8.2. This issue is described here: http://stackoverflow.com/questions/25287130/pandas-sort-index-fails-with-multiindex-containing-floats-as-one-level-when-col?noredirect=1#comment39408150_25287130
The text was updated successfully, but these errors were encountered: