-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
loc() does not swap two rows in multi-index pandas dataframe #22797
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I don't think we make guarantees about the order of returned values from a |
For a single-indexed table, we can reindex it using .loc() instead of .reindex()
In his book, "Python for Data Analysis 2ed" Wes McKinney says "you can reindex more succinctly by label-indexing with loc, and many users prefer to use it", although he used only a single-index table This may be not a bug. however it is natural for one to expect the same effect for the multi-index table. |
I personally think that this inconsistency between single- and multi-index dataframes is dangerous. In python, a list not only represents a collection of objects, but also sets an ordering of these objects. Not respecting this ordering under certain conditions (single- vs. multi-index) is non-intuitive. For sure the inconsistency is not emphasized enough in the documentation. I didn't come across a corresponding warning in the article on advanced indexing, neither is there any note in the documentation of By the way: SO brought me here, see my related SO question. |
I too got mixed up, by using Surely it is strange that it preserves the order for single-index, but not multi-index. Even if it doesn't qualifies as a bug, is it possible to add a warning ? I tried to locate in the code where the warning should be written, without success. :( |
It's a bug. Not sure where though, so investigation would be welcome. |
So, after investigation (thanks pdb), what happened is that we obtain the indexes to return in the order the keys are given, but the way the indexes are stitched together reorder them. More precisely, the culprit seems to be the or operator in pandas/core/indexes/multi.py:3043, that sort the results. I succeeded to get the expected result for the given example by using
|
Testing return order of MultiIndex.loc MultiIndex.loc try to return the result in the same order as the key given.
From issue pandas-dev#22797. Loc did not respect order for MultiIndex Dataframe. It does for single index. When possible, the returned row now respect the given order for keys.
Testing return order of MultiIndex.loc MultiIndex.loc try to return the result in the same order as the key given.
From issue pandas-dev#22797. Loc did not respect order for MultiIndex Dataframe. It does for single index. When possible, the returned row now respect the given order for keys.
Testing return order of MultiIndex.loc MultiIndex.loc try to return the result in the same order as the key given.
From issue pandas-dev#22797. When given a list like object as indexer, the returned result did not respect the order of the indexer, but the order of the MultiIndex levels.
Testing return order of MultiIndex.loc MultiIndex.loc try to return the result in the same order as the key given.
From issue pandas-dev#22797. Loc did not respect order for MultiIndex Dataframe. It does for single index. When possible, the returned row now respect the given order for keys.
So, working on this issues I was confronted to the following problems: if we want to maintains the order given for each level, how should we treat slice(None)? Example follow:
Ideas are welcome. |
Testing return order of MultiIndex.loc MultiIndex.loc try to return the result in the same order as the key given.
From issue pandas-dev#22797. Loc did not respect order for MultiIndex Dataframe. It does for single index. When possible, the returned row now respect the given order for keys.
Testing return order of MultiIndex.loc MultiIndex.loc try to return the result in the same order as the key given.
From issue pandas-dev#22797. When given a list like object as indexer, the returned result did not respect the order of the indexer, but the order of the MultiIndex levels.
Testing return order of MultiIndex.loc MultiIndex.loc try to return the result in the same order as the key given.
From issue pandas-dev#22797. When given a list like object as indexer, the returned result did not respect the order of the indexer, but the order of the MultiIndex levels.
Testing return order of MultiIndex.loc MultiIndex.loc try to return the result in the same order as the key given.
From issue pandas-dev#22797. When given a list like object as indexer, the returned result did not respect the order of the indexer, but the order of the MultiIndex levels.
Problem description
df.loc[['b','a'],:] does not swap the rows 'a' and 'b', nor does for a Series.
Expected Output
The output should be the same as the one obtained via df.reindex(index=['b', 'a'], level=0)
Output of
pd.show_versions()
INSTALLED VERSIONS
commit: None
python: 3.6.3.final.0
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 94 Stepping 3, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None
pandas: 0.20.3
pytest: 3.2.1
pip: 10.0.1
setuptools: 36.5.0.post20170921
Cython: 0.26.1
numpy: 1.14.0
scipy: 0.19.1
xarray: None
IPython: 6.1.0
sphinx: 1.6.3
patsy: 0.4.1
dateutil: 2.6.1
pytz: 2017.2
blosc: None
bottleneck: 1.2.1
tables: 3.4.2
numexpr: 2.6.2
feather: None
matplotlib: 2.1.0
openpyxl: 2.4.8
xlrd: 1.1.0
xlwt: 1.3.0
xlsxwriter: 1.0.2
lxml: 4.1.0
bs4: 4.6.0
html5lib: 0.999999999
sqlalchemy: 1.1.13
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
pandas_gbq: None
pandas_datareader: 0.7.0
The text was updated successfully, but these errors were encountered: