-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: different behaviors of sort_index() and sort_index(level=0) #13431
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Might be a bug, see the docs here, these have been updated since 0.18.1; these should be the same. But you will have to step thru and have a look.
|
@jreback right I have seen that. Thus I think the updated doc is good, without all that complicated descriptions about lexicographically sortedness. But maybe the current implementation is inconsistent with the updated doc... Anyway, I think some workarounds for all multiindex related issues are
Currently my workflow is compatible with the above three, so this bug is not a problem for me now, but I think this is a big issue needing some clarification. |
I think the problem is due to a potential re initialization of factor levels in https://github.com/pydata/pandas/blob/4de83d25d751d8ca102867b2d46a5547c01d7248/pandas/core/frame.py#L3245-L3259. According to the code, if However, if This may explain #9212, I believe. Basically, whether |
could very well be do you want to put in place tests for this issue (and other); and make a change -- see if you can fix without breaking em anything else? |
@jreback I'd like to, though may be no time this month, and I'm a beginner into pandas, starting to use it just yesterday... But maybe this looks easy enough to me. My main question is
Actually I found comments on https://github.com/pydata/pandas/blob/4de83d25d751d8ca102867b2d46a5547c01d7248/pandas/core/frame.py#L3253-L3254 to be funny. If |
|
sort_index()
and sort_index(level=0)
looks to me simply removing https://github.com/pydata/pandas/blob/4de83d25d751d8ca102867b2d46a5547c01d7248/pandas/core/frame.py#L3255-L3256 should do. But I have one question: does |
there can be |
Looks that removing those lines will undo #8017. Actually, the current version of pandas (0.18.1) won't have same behavior on I think the fundamental problem is that when using a multindex, each subindex in it becomes a categorical variable implicitly (labels and levels), including things that are not categorical in nature, such as float. |
Basically I don't think given the current implementation, one can tell the difference between "true" |
its possible that issue was confused because it exposed a printing issue. I don't think there is any difference between true lexsorted and accidental, except that accidental might just be not recorded as such (so its a bug in keeping state). |
as you can see from the example in #8017, although In [1]: import pandas as pd
In [2]: import numpy as np
In [3]:
In [3]: np.random.seed(0)
In [4]: data = np.random.randn(3,4)
In [5]:
In [5]: df_multi_float = pd.DataFrame(data, index=list('def'), columns=pd.MultiIndex.from_tuples([('red', i) for i in [1., 3., 2., 5.]]))
In [6]: df_multi_float[('red', 4.0)] = 'world'
In [7]: a=df_multi_float.sort_index(axis=1)
In [8]: a
Out[8]:
red
1.0 2.0 3.0 4.0 5.0
d 1.764052 0.978738 0.400157 world 2.240893
e 1.867558 0.950088 -0.977278 world -0.151357
f -0.103219 0.144044 0.410599 world 1.454274
In [9]: a.columns
Out[9]:
MultiIndex(levels=[[u'red'], [1.0, 2.0, 3.0, 5.0, 4.0]],
labels=[[0, 0, 0, 0, 0], [0, 1, 2, 4, 3]])
In [10]: pd.__version__
Out[10]: u'0.18.1'
In [11]: a.columns.is_lexsorted()
Out[11]: False |
Inspired by some bug reports around multiindex sortedness (http://stackoverflow.com/questions/31427466/ensuring-lexicographical-sort-in-pandas-multiindex, #10651, #9212), I found that
sort_index()
sometimes can't make a multiindex ready for slicing, butsort_index(level=0)
(so doessortlevel()
) can.While
df2.sort_index()
does give a visually lexicographically sorted output, it DOES NOT support slicing.So I have two questions.
level=0
andlevel=None
are synonyms to me, but they are not. Looking at the code, https://github.com/pydata/pandas/blob/4de83d25d751d8ca102867b2d46a5547c01d7248/pandas/core/frame.py#L3245-L3247 indeed there's a special processing whenlevel
is notNone
.make_index(level=0)
is correct, yetmake_index()
is not.Thanks.
output of
pd.show_versions()
The text was updated successfully, but these errors were encountered: