Skip to content

Commit 429e9f3

Browse files
committed
Merge pull request #4039 from jtratner/fix-multiindex-naming
ENH/BUG: Fix names, levels and labels handling in MultiIndex
2 parents 7fd6b20 + 5cad4d2 commit 429e9f3

27 files changed

+958
-268
lines changed

doc/source/indexing.rst

+86-62
Original file line numberDiff line numberDiff line change
@@ -868,66 +868,6 @@ convert to an integer index:
868868
df_new[(df_new['index'] >= 1.0) & (df_new['index'] < 2)]
869869
870870
871-
.. _indexing.class:
872-
873-
Index objects
874-
-------------
875-
876-
The pandas Index class and its subclasses can be viewed as implementing an
877-
*ordered set* in addition to providing the support infrastructure necessary for
878-
lookups, data alignment, and reindexing. The easiest way to create one directly
879-
is to pass a list or other sequence to ``Index``:
880-
881-
.. ipython:: python
882-
883-
index = Index(['e', 'd', 'a', 'b'])
884-
index
885-
'd' in index
886-
887-
You can also pass a ``name`` to be stored in the index:
888-
889-
890-
.. ipython:: python
891-
892-
index = Index(['e', 'd', 'a', 'b'], name='something')
893-
index.name
894-
895-
Starting with pandas 0.5, the name, if set, will be shown in the console
896-
display:
897-
898-
.. ipython:: python
899-
900-
index = Index(list(range(5)), name='rows')
901-
columns = Index(['A', 'B', 'C'], name='cols')
902-
df = DataFrame(np.random.randn(5, 3), index=index, columns=columns)
903-
df
904-
df['A']
905-
906-
907-
Set operations on Index objects
908-
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
909-
910-
.. _indexing.set_ops:
911-
912-
The three main operations are ``union (|)``, ``intersection (&)``, and ``diff
913-
(-)``. These can be directly called as instance methods or used via overloaded
914-
operators:
915-
916-
.. ipython:: python
917-
918-
a = Index(['c', 'b', 'a'])
919-
b = Index(['c', 'e', 'd'])
920-
a.union(b)
921-
a | b
922-
a & b
923-
a - b
924-
925-
``isin`` method of Index objects
926-
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
927-
928-
One additional operation is the ``isin`` method that works analogously to the
929-
``Series.isin`` method found :ref:`here <indexing.boolean>`.
930-
931871
.. _indexing.hierarchical:
932872

933873
Hierarchical indexing (MultiIndex)
@@ -1189,7 +1129,7 @@ are named.
11891129

11901130
.. ipython:: python
11911131
1192-
s.index.names = ['L1', 'L2']
1132+
s.index.set_names(['L1', 'L2'], inplace=True)
11931133
s.sortlevel(level='L1')
11941134
s.sortlevel(level='L2')
11951135
@@ -1229,7 +1169,9 @@ However:
12291169
::
12301170

12311171
>>> s.ix[('a', 'b'):('b', 'a')]
1232-
Exception: MultiIndex lexsort depth 1, key was length 2
1172+
Traceback (most recent call last)
1173+
...
1174+
KeyError: Key length (3) was greater than MultiIndex lexsort depth (2)
12331175

12341176
Swapping levels with ``swaplevel``
12351177
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -1274,6 +1216,88 @@ not check (or care) whether the levels themselves are sorted. Fortunately, the
12741216
constructors ``from_tuples`` and ``from_arrays`` ensure that this is true, but
12751217
if you compute the levels and labels yourself, please be careful.
12761218

1219+
.. _indexing.class:
1220+
1221+
Index objects
1222+
-------------
1223+
1224+
The pandas Index class and its subclasses can be viewed as implementing an
1225+
*ordered set* in addition to providing the support infrastructure necessary for
1226+
lookups, data alignment, and reindexing. The easiest way to create one directly
1227+
is to pass a list or other sequence to ``Index``:
1228+
1229+
.. ipython:: python
1230+
1231+
index = Index(['e', 'd', 'a', 'b'])
1232+
index
1233+
'd' in index
1234+
1235+
You can also pass a ``name`` to be stored in the index:
1236+
1237+
1238+
.. ipython:: python
1239+
1240+
index = Index(['e', 'd', 'a', 'b'], name='something')
1241+
index.name
1242+
1243+
Starting with pandas 0.5, the name, if set, will be shown in the console
1244+
display:
1245+
1246+
.. ipython:: python
1247+
1248+
index = Index(list(range(5)), name='rows')
1249+
columns = Index(['A', 'B', 'C'], name='cols')
1250+
df = DataFrame(np.random.randn(5, 3), index=index, columns=columns)
1251+
df
1252+
df['A']
1253+
1254+
1255+
Set operations on Index objects
1256+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1257+
1258+
.. _indexing.set_ops:
1259+
1260+
The three main operations are ``union (|)``, ``intersection (&)``, and ``diff
1261+
(-)``. These can be directly called as instance methods or used via overloaded
1262+
operators:
1263+
1264+
.. ipython:: python
1265+
1266+
a = Index(['c', 'b', 'a'])
1267+
b = Index(['c', 'e', 'd'])
1268+
a.union(b)
1269+
a | b
1270+
a & b
1271+
a - b
1272+
1273+
``isin`` method of Index objects
1274+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1275+
1276+
One additional operation is the ``isin`` method that works analogously to the
1277+
``Series.isin`` method found :ref:`here <indexing.boolean>`.
1278+
1279+
Setting index metadata (``name(s)``, ``levels``, ``labels``)
1280+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1281+
1282+
.. _indexing.set_metadata:
1283+
1284+
Indexes are "mostly immutable", but it is possible to set and change their
1285+
metadata, like the index ``name`` (or, for ``MultiIndex``, ``levels`` and
1286+
``labels``).
1287+
1288+
You can use the ``rename``, ``set_names``, ``set_levels``, and ``set_labels``
1289+
to set these attributes directly. They default to returning a copy; however,
1290+
you can specify ``inplace=True`` to have the data change inplace.
1291+
1292+
.. ipython:: python
1293+
1294+
ind = Index([1, 2, 3])
1295+
ind.rename("apple")
1296+
ind
1297+
ind.set_names(["apple"], inplace=True)
1298+
ind.name = "bob"
1299+
ind
1300+
12771301
Adding an index to an existing DataFrame
12781302
----------------------------------------
12791303

doc/source/release.rst

+26
Original file line numberDiff line numberDiff line change
@@ -47,6 +47,12 @@ pandas 0.13
4747
- Added a more informative error message when plot arguments contain
4848
overlapping color and style arguments (:issue:`4402`)
4949
- Significant table writing performance improvements in ``HDFStore``
50+
- ``Index.copy()`` and ``MultiIndex.copy()`` now accept keyword arguments to
51+
change attributes (i.e., ``names``, ``levels``, ``labels``)
52+
(:issue:`4039`)
53+
- Add ``rename`` and ``set_names`` methods to ``Index`` as well as
54+
``set_names``, ``set_levels``, ``set_labels`` to ``MultiIndex``.
55+
(:issue:`4039`)
5056

5157
**API Changes**
5258

@@ -66,6 +72,7 @@ pandas 0.13
6672
an alias of iteritems used to get around ``2to3``'s changes).
6773
(:issue:`4384`, :issue:`4375`, :issue:`4372`)
6874
- ``Series.get`` with negative indexers now returns the same as ``[]`` (:issue:`4390`)
75+
6976
- ``HDFStore``
7077

7178
- added an ``is_open`` property to indicate if the underlying file handle is_open;
@@ -83,6 +90,21 @@ pandas 0.13
8390
be raised if you try to use ``mode='w'`` with an OPEN file handle (:issue:`4367`)
8491
- allow a passed locations array or mask as a ``where`` condition (:issue:`4467`)
8592

93+
- ``Index`` and ``MultiIndex`` changes (:issue:`4039`):
94+
95+
- Setting ``levels`` and ``labels`` directly on ``MultiIndex`` is now
96+
deprecated. Instead, you can use the ``set_levels()`` and
97+
``set_labels()`` methods.
98+
- ``levels``, ``labels`` and ``names`` properties no longer return lists,
99+
but instead return containers that do not allow setting of items
100+
('mostly immutable')
101+
- ``levels``, ``labels`` and ``names`` are validated upon setting and are
102+
either copied or shallow-copied.
103+
- ``__deepcopy__`` now returns a shallow copy (currently: a view) of the
104+
data - allowing metadata changes.
105+
- ``MultiIndex.astype()`` now only allows ``np.object_``-like dtypes and
106+
now returns a ``MultiIndex`` rather than an ``Index``. (:issue:`4039`)
107+
86108
**Experimental Features**
87109

88110
**Bug Fixes**
@@ -136,6 +158,10 @@ pandas 0.13
136158
- frozenset objects now raise in the ``Series`` constructor (:issue:`4482`,
137159
:issue:`4480`)
138160
- Fixed issue with sorting a duplicate multi-index that has multiple dtypes (:issue:`4516`)
161+
- Fixed bug in ``DataFrame.set_values`` which was causing name attributes to
162+
be lost when expanding the index. (:issue:`3742`, :issue:`4039`)
163+
- Fixed issue where individual ``names``, ``levels`` and ``labels`` could be
164+
set on ``MultiIndex`` without validation (:issue:`3714`, :issue:`4039`)
139165

140166
pandas 0.12
141167
===========

doc/source/v0.13.0.txt

+18
Original file line numberDiff line numberDiff line change
@@ -72,6 +72,24 @@ API changes
7272
import os
7373
os.remove(path)
7474

75+
- Changes to how ``Index`` and ``MultiIndex`` handle metadata (``levels``,
76+
``labels``, and ``names``) (:issue:`4039`):
77+
78+
..code-block ::
79+
80+
# previously, you would have set levels or labels directly
81+
index.levels = [[1, 2, 3, 4], [1, 2, 4, 4]]
82+
83+
# now, you use the set_levels or set_labels methods
84+
index = index.set_levels([[1, 2, 3, 4], [1, 2, 4, 4]])
85+
86+
# similarly, for names, you can rename the object
87+
# but setting names is not deprecated.
88+
index = index.set_names(["bob", "cranberry"])
89+
90+
# and all methods take an inplace kwarg
91+
index.set_names(["bob", "cranberry"], inplace=True)
92+
7593
Enhancements
7694
~~~~~~~~~~~~
7795

pandas/core/base.py

+87-1
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,8 @@
11
"""
2-
Base class(es) for all pandas objects.
2+
Base and utility classes for pandas objects.
33
"""
44
from pandas import compat
5+
import numpy as np
56

67
class StringMixin(object):
78
"""implements string methods so long as object defines a `__unicode__` method.
@@ -56,3 +57,88 @@ def __unicode__(self):
5657
"""
5758
# Should be overwritten by base classes
5859
return object.__repr__(self)
60+
61+
class FrozenList(PandasObject, list):
62+
"""
63+
Container that doesn't allow setting item *but*
64+
because it's technically non-hashable, will be used
65+
for lookups, appropriately, etc.
66+
"""
67+
# Sidenote: This has to be of type list, otherwise it messes up PyTables typechecks
68+
69+
def __add__(self, other):
70+
if isinstance(other, tuple):
71+
other = list(other)
72+
return self.__class__(super(FrozenList, self).__add__(other))
73+
74+
__iadd__ = __add__
75+
76+
# Python 2 compat
77+
def __getslice__(self, i, j):
78+
return self.__class__(super(FrozenList, self).__getslice__(i, j))
79+
80+
def __getitem__(self, n):
81+
# Python 3 compat
82+
if isinstance(n, slice):
83+
return self.__class__(super(FrozenList, self).__getitem__(n))
84+
return super(FrozenList, self).__getitem__(n)
85+
86+
def __radd__(self, other):
87+
if isinstance(other, tuple):
88+
other = list(other)
89+
return self.__class__(other + list(self))
90+
91+
def __eq__(self, other):
92+
if isinstance(other, (tuple, FrozenList)):
93+
other = list(other)
94+
return super(FrozenList, self).__eq__(other)
95+
96+
__req__ = __eq__
97+
98+
def __mul__(self, other):
99+
return self.__class__(super(FrozenList, self).__mul__(other))
100+
101+
__imul__ = __mul__
102+
103+
def __hash__(self):
104+
return hash(tuple(self))
105+
106+
def _disabled(self, *args, **kwargs):
107+
"""This method will not function because object is immutable."""
108+
raise TypeError("'%s' does not support mutable operations." %
109+
self.__class__)
110+
111+
def __unicode__(self):
112+
from pandas.core.common import pprint_thing
113+
return "%s(%s)" % (self.__class__.__name__,
114+
pprint_thing(self, quote_strings=True,
115+
escape_chars=('\t', '\r', '\n')))
116+
117+
__setitem__ = __setslice__ = __delitem__ = __delslice__ = _disabled
118+
pop = append = extend = remove = sort = insert = _disabled
119+
120+
121+
class FrozenNDArray(PandasObject, np.ndarray):
122+
123+
# no __array_finalize__ for now because no metadata
124+
def __new__(cls, data, dtype=None, copy=False):
125+
if copy is None:
126+
copy = not isinstance(data, FrozenNDArray)
127+
res = np.array(data, dtype=dtype, copy=copy).view(cls)
128+
return res
129+
130+
def _disabled(self, *args, **kwargs):
131+
"""This method will not function because object is immutable."""
132+
raise TypeError("'%s' does not support mutable operations." %
133+
self.__class__)
134+
135+
__setitem__ = __setslice__ = __delitem__ = __delslice__ = _disabled
136+
put = itemset = fill = _disabled
137+
138+
def _shallow_copy(self):
139+
return self.view()
140+
141+
def values(self):
142+
"""returns *copy* of underlying array"""
143+
arr = self.view(np.ndarray).copy()
144+
return arr

pandas/core/common.py

+1-3
Original file line numberDiff line numberDiff line change
@@ -8,18 +8,16 @@
88

99
from numpy.lib.format import read_array, write_array
1010
import numpy as np
11-
1211
import pandas.algos as algos
1312
import pandas.lib as lib
1413
import pandas.tslib as tslib
1514

1615
from pandas import compat
1716
from pandas.compat import StringIO, BytesIO, range, long, u, zip, map
18-
19-
2017
from pandas.core.config import get_option
2118
from pandas.core import array as pa
2219

20+
2321
# XXX: HACK for NumPy 1.5.1 to suppress warnings
2422
try:
2523
np.seterr(all='ignore')

pandas/core/frame.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -1150,7 +1150,7 @@ def to_records(self, index=True, convert_datetime64=True):
11501150
arrays = ix_vals+ [self[c].values for c in self.columns]
11511151

11521152
count = 0
1153-
index_names = self.index.names
1153+
index_names = list(self.index.names)
11541154
if isinstance(self.index, MultiIndex):
11551155
for i, n in enumerate(index_names):
11561156
if n is None:

pandas/core/generic.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -404,7 +404,7 @@ def drop(self, labels, axis=0, level=None):
404404
new_axis = axis.drop(labels)
405405
dropped = self.reindex(**{axis_name: new_axis})
406406
try:
407-
dropped.axes[axis_].names = axis.names
407+
dropped.axes[axis_].set_names(axis.names, inplace=True)
408408
except AttributeError:
409409
pass
410410
return dropped

0 commit comments

Comments
 (0)