Skip to content

Commit 2a9e994

Browse files
committed
Merge pull request #6301 from jreback/mi_indexing
ENH: per axis and per level indexing (orig GH6134)
2 parents 8452caf + 7d70710 commit 2a9e994

File tree

7 files changed

+830
-81
lines changed

7 files changed

+830
-81
lines changed

doc/source/indexing.rst

+136-26
Original file line numberDiff line numberDiff line change
@@ -426,14 +426,14 @@ python/numpy allow slicing past the end of an array without an associated error.
426426
values. A single indexer that is out-of-bounds and drops the dimensions of the object will still raise
427427
``IndexError`` (:issue:`6296`). This could result in an empty axis (e.g. an empty DataFrame being returned)
428428

429-
.. ipython:: python
429+
.. ipython:: python
430430
431-
df = DataFrame(np.random.randn(5,2),columns=list('AB'))
432-
df
433-
df.iloc[[4,5,6]]
434-
df.iloc[4:6]
435-
df.iloc[:,2:3]
436-
df.iloc[:,1:3]
431+
dfl = DataFrame(np.random.randn(5,2),columns=list('AB'))
432+
dfl
433+
dfl.iloc[[4,5,6]]
434+
dfl.iloc[4:6]
435+
dfl.iloc[:,2:3]
436+
dfl.iloc[:,1:3]
437437
438438
.. _indexing.basics.partial_setting:
439439

@@ -1684,47 +1684,143 @@ of tuples:
16841684
Advanced indexing with hierarchical index
16851685
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
16861686

1687-
Syntactically integrating ``MultiIndex`` in advanced indexing with ``.ix`` is a
1687+
Syntactically integrating ``MultiIndex`` in advanced indexing with ``.loc/.ix`` is a
16881688
bit challenging, but we've made every effort to do so. for example the
16891689
following works as you would expect:
16901690

16911691
.. ipython:: python
16921692
16931693
df = df.T
16941694
df
1695-
df.ix['bar']
1696-
df.ix['bar', 'two']
1695+
df.loc['bar']
1696+
df.loc['bar', 'two']
16971697
1698-
"Partial" slicing also works quite nicely for the topmost level:
1698+
"Partial" slicing also works quite nicely.
16991699

17001700
.. ipython:: python
17011701
1702-
df.ix['baz':'foo']
1702+
df.loc['baz':'foo']
17031703
1704-
But lower levels cannot be sliced in this way, because the MultiIndex uses
1705-
its multiple index dimensions to slice along one dimension of your object:
1704+
You can slice with a 'range' of values, by providing a slice of tuples.
17061705

17071706
.. ipython:: python
17081707
1709-
df.ix[('baz', 'two'):('qux', 'one')]
1710-
df.ix[('baz', 'two'):'foo']
1708+
df.loc[('baz', 'two'):('qux', 'one')]
1709+
df.loc[('baz', 'two'):'foo']
17111710
17121711
Passing a list of labels or tuples works similar to reindexing:
17131712

17141713
.. ipython:: python
17151714
17161715
df.ix[[('bar', 'two'), ('qux', 'one')]]
17171716
1718-
The following does not work, and it's not clear if it should or not:
1717+
.. _indexing.mi_slicers:
17191718

1720-
::
1719+
Multiindexing using slicers
1720+
~~~~~~~~~~~~~~~~~~~~~~~~~~~
1721+
1722+
.. versionadded:: 0.14.0
1723+
1724+
In 0.14.0 we added a new way to slice multi-indexed objects.
1725+
You can slice a multi-index by providing multiple indexers.
1726+
1727+
You can provide any of the selectors as if you are indexing by label, see :ref:`Selection by Label <indexing.label>`,
1728+
including slices, lists of labels, labels, and boolean indexers.
1729+
1730+
You can use ``slice(None)`` to select all the contents of *that* level. You do not need to specify all the
1731+
*deeper* levels, they will be implied as ``slice(None)``.
1732+
1733+
As usual, **both sides** of the slicers are included as this is label indexing.
1734+
1735+
.. warning::
17211736

1722-
>>> df.ix[['bar', 'qux']]
1737+
You should specify all axes in the ``.loc`` specifier, meaning the indexer for the **index** and
1738+
for the **columns**. Their are some ambiguous cases where the passed indexer could be mis-interpreted
1739+
as indexing *both* axes, rather than into say the MuliIndex for the rows.
1740+
1741+
You should do this:
1742+
1743+
.. code-block:: python
1744+
1745+
df.loc[(slice('A1','A3'),.....),:]
1746+
1747+
rather than this:
1748+
1749+
.. code-block:: python
1750+
1751+
df.loc[(slice('A1','A3'),.....)]
1752+
1753+
.. warning::
1754+
1755+
You will need to make sure that the selection axes are fully lexsorted!
1756+
1757+
.. ipython:: python
1758+
1759+
def mklbl(prefix,n):
1760+
return ["%s%s" % (prefix,i) for i in range(n)]
1761+
1762+
miindex = MultiIndex.from_product([mklbl('A',4),
1763+
mklbl('B',2),
1764+
mklbl('C',4),
1765+
mklbl('D',2)])
1766+
micolumns = MultiIndex.from_tuples([('a','foo'),('a','bar'),
1767+
('b','foo'),('b','bah')],
1768+
names=['lvl0', 'lvl1'])
1769+
dfmi = DataFrame(np.arange(len(miindex)*len(micolumns)).reshape((len(miindex),len(micolumns))),
1770+
index=miindex,
1771+
columns=micolumns).sortlevel().sortlevel(axis=1)
1772+
dfmi
1773+
1774+
Basic multi-index slicing using slices, lists, and labels.
1775+
1776+
.. ipython:: python
1777+
1778+
dfmi.loc[(slice('A1','A3'),slice(None), ['C1','C3']),:]
1779+
1780+
You can use a ``pd.IndexSlice`` to shortcut the creation of these slices
1781+
1782+
.. ipython:: python
1783+
1784+
idx = pd.IndexSlice
1785+
dfmi.loc[idx[:,:,['C1','C3']],idx[:,'foo']]
1786+
1787+
It is possible to perform quite complicated selections using this method on multiple
1788+
axes at the same time.
1789+
1790+
.. ipython:: python
17231791
1724-
The code for implementing ``.ix`` makes every attempt to "do the right thing"
1725-
but as you use it you may uncover corner cases or unintuitive behavior. If you
1726-
do find something like this, do not hesitate to report the issue or ask on the
1727-
mailing list.
1792+
dfmi.loc['A1',(slice(None),'foo')]
1793+
dfmi.loc[idx[:,:,['C1','C3']],idx[:,'foo']]
1794+
1795+
Using a boolean indexer you can provide selection related to the *values*.
1796+
1797+
.. ipython:: python
1798+
1799+
mask = dfmi[('a','foo')]>200
1800+
dfmi.loc[idx[mask,:,['C1','C3']],idx[:,'foo']]
1801+
1802+
You can also specify the ``axis`` argument to ``.loc`` to interpret the passed
1803+
slicers on a single axis.
1804+
1805+
.. ipython:: python
1806+
1807+
dfmi.loc(axis=0)[:,:,['C1','C3']]
1808+
1809+
Furthermore you can *set* the values using these methods
1810+
1811+
.. ipython:: python
1812+
1813+
df2 = dfmi.copy()
1814+
df2.loc(axis=0)[:,:,['C1','C3']] = -10
1815+
df2
1816+
1817+
You can use a right-hand-side of an alignable object as well.
1818+
1819+
.. ipython:: python
1820+
1821+
df2 = dfmi.copy()
1822+
df2.loc[idx[:,:,['C1','C3']],:] = df2*1000
1823+
df2
17281824
17291825
.. _indexing.xs:
17301826

@@ -1738,6 +1834,11 @@ selecting data at a particular level of a MultiIndex easier.
17381834
17391835
df.xs('one', level='second')
17401836
1837+
.. ipython:: python
1838+
1839+
# using the slicers (new in 0.14.0)
1840+
df.loc[(slice(None),'one'),:]
1841+
17411842
You can also select on the columns with :meth:`~pandas.MultiIndex.xs`, by
17421843
providing the axis argument
17431844

@@ -1746,29 +1847,38 @@ providing the axis argument
17461847
df = df.T
17471848
df.xs('one', level='second', axis=1)
17481849
1850+
.. ipython:: python
1851+
1852+
# using the slicers (new in 0.14.0)
1853+
df.loc[:,(slice(None),'one')]
1854+
17491855
:meth:`~pandas.MultiIndex.xs` also allows selection with multiple keys
17501856

17511857
.. ipython:: python
17521858
17531859
df.xs(('one', 'bar'), level=('second', 'first'), axis=1)
17541860
1861+
.. ipython:: python
1862+
1863+
# using the slicers (new in 0.14.0)
1864+
df.loc[:,('bar','one')]
17551865
17561866
.. versionadded:: 0.13.0
17571867

17581868
You can pass ``drop_level=False`` to :meth:`~pandas.MultiIndex.xs` to retain
17591869
the level that was selected
17601870

1761-
.. ipython::
1871+
.. ipython:: python
17621872
17631873
df.xs('one', level='second', axis=1, drop_level=False)
17641874
17651875
versus the result with ``drop_level=True`` (the default value)
17661876

1767-
.. ipython::
1877+
.. ipython:: python
17681878
17691879
df.xs('one', level='second', axis=1, drop_level=True)
17701880
1771-
.. ipython::
1881+
.. ipython:: python
17721882
:suppress:
17731883
17741884
df = df.T

doc/source/release.rst

+2
Original file line numberDiff line numberDiff line change
@@ -74,6 +74,7 @@ Improvements to existing features
7474
the func (:issue:`6289`)
7575
- ``plot(legend='reverse')`` will now reverse the order of legend labels for most plot kinds.
7676
(:issue:`6014`)
77+
- Allow multi-index slicers (:issue:`6134`, :issue:`4036`, :issue:`3057`, :issue:`2598`, :issue:`5641`)
7778

7879
.. _release.bug_fixes-0.14.0:
7980

@@ -94,6 +95,7 @@ Bug Fixes
9495
- Issue with groupby ``agg`` with a single function and a a mixed-type frame (:issue:`6337`)
9596
- Bug in ``DataFrame.replace()`` when passing a non- ``bool``
9697
``to_replace`` argument (:issue:`6332`)
98+
- Raise when trying to align on different levels of a multi-index assignment (:issue:`3738`)
9799

98100
pandas 0.13.1
99101
-------------

doc/source/v0.14.0.txt

+107
Original file line numberDiff line numberDiff line change
@@ -29,6 +29,113 @@ API changes
2929
df.iloc[:,2:3]
3030
df.iloc[:,1:3]
3131

32+
MultiIndexing Using Slicers
33+
~~~~~~~~~~~~~~~~~~~~~~~~~~~
34+
35+
In 0.14.0 we added a new way to slice multi-indexed objects.
36+
You can slice a multi-index by providing multiple indexers.
37+
38+
You can provide any of the selectors as if you are indexing by label, see :ref:`Selection by Label <indexing.label>`,
39+
including slices, lists of labels, labels, and boolean indexers.
40+
41+
You can use ``slice(None)`` to select all the contents of *that* level. You do not need to specify all the
42+
*deeper* levels, they will be implied as ``slice(None)``.
43+
44+
As usual, **both sides** of the slicers are included as this is label indexing.
45+
46+
See :ref:`the docs<indexing.mi_slicers>`
47+
See also issues (:issue:`6134`, :issue:`4036`, :issue:`3057`, :issue:`2598`, :issue:`5641`)
48+
49+
.. warning::
50+
51+
You should specify all axes in the ``.loc`` specifier, meaning the indexer for the **index** and
52+
for the **columns**. Their are some ambiguous cases where the passed indexer could be mis-interpreted
53+
as indexing *both* axes, rather than into say the MuliIndex for the rows.
54+
55+
You should do this:
56+
57+
.. code-block:: python
58+
59+
df.loc[(slice('A1','A3'),.....),:]
60+
61+
rather than this:
62+
63+
.. code-block:: python
64+
65+
df.loc[(slice('A1','A3'),.....)]
66+
67+
.. warning::
68+
69+
You will need to make sure that the selection axes are fully lexsorted!
70+
71+
.. ipython:: python
72+
73+
def mklbl(prefix,n):
74+
return ["%s%s" % (prefix,i) for i in range(n)]
75+
76+
index = MultiIndex.from_product([mklbl('A',4),
77+
mklbl('B',2),
78+
mklbl('C',4),
79+
mklbl('D',2)])
80+
columns = MultiIndex.from_tuples([('a','foo'),('a','bar'),
81+
('b','foo'),('b','bah')],
82+
names=['lvl0', 'lvl1'])
83+
df = DataFrame(np.arange(len(index)*len(columns)).reshape((len(index),len(columns))),
84+
index=index,
85+
columns=columns).sortlevel().sortlevel(axis=1)
86+
df
87+
88+
Basic multi-index slicing using slices, lists, and labels.
89+
90+
.. ipython:: python
91+
92+
df.loc[(slice('A1','A3'),slice(None), ['C1','C3']),:]
93+
94+
You can use a ``pd.IndexSlice`` to shortcut the creation of these slices
95+
96+
.. ipython:: python
97+
98+
idx = pd.IndexSlice
99+
df.loc[idx[:,:,['C1','C3']],idx[:,'foo']]
100+
101+
It is possible to perform quite complicated selections using this method on multiple
102+
axes at the same time.
103+
104+
.. ipython:: python
105+
106+
df.loc['A1',(slice(None),'foo')]
107+
df.loc[idx[:,:,['C1','C3']],idx[:,'foo']]
108+
109+
Using a boolean indexer you can provide selection related to the *values*.
110+
111+
.. ipython:: python
112+
113+
mask = df[('a','foo')]>200
114+
df.loc[idx[mask,:,['C1','C3']],idx[:,'foo']]
115+
116+
You can also specify the ``axis`` argument to ``.loc`` to interpret the passed
117+
slicers on a single axis.
118+
119+
.. ipython:: python
120+
121+
df.loc(axis=0)[:,:,['C1','C3']]
122+
123+
Furthermore you can *set* the values using these methods
124+
125+
.. ipython:: python
126+
127+
df2 = df.copy()
128+
df2.loc(axis=0)[:,:,['C1','C3']] = -10
129+
df2
130+
131+
You can use a right-hand-side of an alignable object as well.
132+
133+
.. ipython:: python
134+
135+
df2 = df.copy()
136+
df2.loc[idx[:,:,['C1','C3']],:] = df2*1000
137+
df2
138+
32139
Prior Version Deprecations/Changes
33140
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
34141

pandas/core/api.py

+1
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,7 @@
1919

2020
WidePanel = Panel
2121

22+
from pandas.core.indexing import IndexSlice
2223
from pandas.tseries.offsets import DateOffset
2324
from pandas.tseries.tools import to_datetime
2425
from pandas.tseries.index import (DatetimeIndex, Timestamp,

0 commit comments

Comments
 (0)