Skip to content

Commit d8fd5a6

Browse files
ms7463jreback
authored andcommitted
ENH: MultiIndex.from_frame (#23141)
1 parent 8dc22d8 commit d8fd5a6

File tree

6 files changed

+275
-51
lines changed

6 files changed

+275
-51
lines changed

doc/source/advanced.rst

+16-2
Original file line numberDiff line numberDiff line change
@@ -62,8 +62,9 @@ The :class:`MultiIndex` object is the hierarchical analogue of the standard
6262
can think of ``MultiIndex`` as an array of tuples where each tuple is unique. A
6363
``MultiIndex`` can be created from a list of arrays (using
6464
:meth:`MultiIndex.from_arrays`), an array of tuples (using
65-
:meth:`MultiIndex.from_tuples`), or a crossed set of iterables (using
66-
:meth:`MultiIndex.from_product`). The ``Index`` constructor will attempt to return
65+
:meth:`MultiIndex.from_tuples`), a crossed set of iterables (using
66+
:meth:`MultiIndex.from_product`), or a :class:`DataFrame` (using
67+
:meth:`MultiIndex.from_frame`). The ``Index`` constructor will attempt to return
6768
a ``MultiIndex`` when it is passed a list of tuples. The following examples
6869
demonstrate different ways to initialize MultiIndexes.
6970

@@ -89,6 +90,19 @@ to use the :meth:`MultiIndex.from_product` method:
8990
iterables = [['bar', 'baz', 'foo', 'qux'], ['one', 'two']]
9091
pd.MultiIndex.from_product(iterables, names=['first', 'second'])
9192
93+
You can also construct a ``MultiIndex`` from a ``DataFrame`` directly, using
94+
the method :meth:`MultiIndex.from_frame`. This is a complementary method to
95+
:meth:`MultiIndex.to_frame`.
96+
97+
.. versionadded:: 0.24.0
98+
99+
.. ipython:: python
100+
101+
df = pd.DataFrame([['bar', 'one'], ['bar', 'two'],
102+
['foo', 'one'], ['foo', 'two']],
103+
columns=['first', 'second'])
104+
pd.MultiIndex.from_frame(df)
105+
92106
As a convenience, you can pass a list of arrays directly into ``Series`` or
93107
``DataFrame`` to construct a ``MultiIndex`` automatically:
94108

doc/source/api.rst

+1
Original file line numberDiff line numberDiff line change
@@ -1703,6 +1703,7 @@ MultiIndex Constructors
17031703
MultiIndex.from_arrays
17041704
MultiIndex.from_tuples
17051705
MultiIndex.from_product
1706+
MultiIndex.from_frame
17061707

17071708
MultiIndex Attributes
17081709
~~~~~~~~~~~~~~~~~~~~~

doc/source/whatsnew/v0.24.0.rst

+2
Original file line numberDiff line numberDiff line change
@@ -382,6 +382,7 @@ Backwards incompatible API changes
382382
- Passing scalar values to :class:`DatetimeIndex` or :class:`TimedeltaIndex` will now raise ``TypeError`` instead of ``ValueError`` (:issue:`23539`)
383383
- ``max_rows`` and ``max_cols`` parameters removed from :class:`HTMLFormatter` since truncation is handled by :class:`DataFrameFormatter` (:issue:`23818`)
384384
- :meth:`read_csv` will now raise a ``ValueError`` if a column with missing values is declared as having dtype ``bool`` (:issue:`20591`)
385+
- The column order of the resultant :class:`DataFrame` from :meth:`MultiIndex.to_frame` is now guaranteed to match the :attr:`MultiIndex.names` order. (:issue:`22420`)
385386

386387
.. _whatsnew_0240.api_breaking.deps:
387388

@@ -1404,6 +1405,7 @@ MultiIndex
14041405

14051406
- Removed compatibility for :class:`MultiIndex` pickles prior to version 0.8.0; compatibility with :class:`MultiIndex` pickles from version 0.13 forward is maintained (:issue:`21654`)
14061407
- :meth:`MultiIndex.get_loc_level` (and as a consequence, ``.loc`` on a ``Series`` or ``DataFrame`` with a :class:`MultiIndex` index) will now raise a ``KeyError``, rather than returning an empty ``slice``, if asked a label which is present in the ``levels`` but is unused (:issue:`22221`)
1408+
- :cls:`MultiIndex` has gained the :meth:`MultiIndex.from_frame`, it allows constructing a :cls:`MultiIndex` object from a :cls:`DataFrame` (:issue:`22420`)
14071409
- Fix ``TypeError`` in Python 3 when creating :class:`MultiIndex` in which some levels have mixed types, e.g. when some labels are tuples (:issue:`15457`)
14081410

14091411
I/O

pandas/core/indexes/multi.py

+116-32
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,5 @@
11
# pylint: disable=E1101,E1103,W0232
2+
from collections import OrderedDict
23
import datetime
34
from sys import getsizeof
45
import warnings
@@ -18,6 +19,7 @@
1819
is_integer, is_iterator, is_list_like, is_object_dtype, is_scalar,
1920
pandas_dtype)
2021
from pandas.core.dtypes.dtypes import ExtensionDtype, PandasExtensionDtype
22+
from pandas.core.dtypes.generic import ABCDataFrame
2123
from pandas.core.dtypes.missing import array_equivalent, isna
2224

2325
import pandas.core.algorithms as algos
@@ -125,25 +127,25 @@ class MultiIndex(Index):
125127
Parameters
126128
----------
127129
levels : sequence of arrays
128-
The unique labels for each level
130+
The unique labels for each level.
129131
codes : sequence of arrays
130-
Integers for each level designating which label at each location
132+
Integers for each level designating which label at each location.
131133
132134
.. versionadded:: 0.24.0
133135
labels : sequence of arrays
134-
Integers for each level designating which label at each location
136+
Integers for each level designating which label at each location.
135137
136138
.. deprecated:: 0.24.0
137139
Use ``codes`` instead
138140
sortorder : optional int
139141
Level of sortedness (must be lexicographically sorted by that
140-
level)
142+
level).
141143
names : optional sequence of objects
142-
Names for each of the index levels. (name is accepted for compat)
143-
copy : boolean, default False
144-
Copy the meta-data
145-
verify_integrity : boolean, default True
146-
Check that the levels/codes are consistent and valid
144+
Names for each of the index levels. (name is accepted for compat).
145+
copy : bool, default False
146+
Copy the meta-data.
147+
verify_integrity : bool, default True
148+
Check that the levels/codes are consistent and valid.
147149
148150
Attributes
149151
----------
@@ -158,6 +160,7 @@ class MultiIndex(Index):
158160
from_arrays
159161
from_tuples
160162
from_product
163+
from_frame
161164
set_levels
162165
set_codes
163166
to_frame
@@ -175,13 +178,9 @@ class MultiIndex(Index):
175178
MultiIndex.from_product : Create a MultiIndex from the cartesian product
176179
of iterables.
177180
MultiIndex.from_tuples : Convert list of tuples to a MultiIndex.
181+
MultiIndex.from_frame : Make a MultiIndex from a DataFrame.
178182
Index : The base pandas Index type.
179183
180-
Notes
181-
-----
182-
See the `user guide
183-
<http://pandas.pydata.org/pandas-docs/stable/advanced.html>`_ for more.
184-
185184
Examples
186185
---------
187186
A new ``MultiIndex`` is typically constructed using one of the helper
@@ -196,6 +195,11 @@ class MultiIndex(Index):
196195
197196
See further examples for how to construct a MultiIndex in the doc strings
198197
of the mentioned helper methods.
198+
199+
Notes
200+
-----
201+
See the `user guide
202+
<http://pandas.pydata.org/pandas-docs/stable/advanced.html>`_ for more.
199203
"""
200204

201205
# initialize to zero-length tuples to make everything work
@@ -288,7 +292,7 @@ def _verify_integrity(self, codes=None, levels=None):
288292
@classmethod
289293
def from_arrays(cls, arrays, sortorder=None, names=None):
290294
"""
291-
Convert arrays to MultiIndex
295+
Convert arrays to MultiIndex.
292296
293297
Parameters
294298
----------
@@ -297,7 +301,9 @@ def from_arrays(cls, arrays, sortorder=None, names=None):
297301
len(arrays) is the number of levels.
298302
sortorder : int or None
299303
Level of sortedness (must be lexicographically sorted by that
300-
level)
304+
level).
305+
names : list / sequence of str, optional
306+
Names for the levels in the index.
301307
302308
Returns
303309
-------
@@ -308,11 +314,15 @@ def from_arrays(cls, arrays, sortorder=None, names=None):
308314
MultiIndex.from_tuples : Convert list of tuples to MultiIndex.
309315
MultiIndex.from_product : Make a MultiIndex from cartesian product
310316
of iterables.
317+
MultiIndex.from_frame : Make a MultiIndex from a DataFrame.
311318
312319
Examples
313320
--------
314321
>>> arrays = [[1, 1, 2, 2], ['red', 'blue', 'red', 'blue']]
315322
>>> pd.MultiIndex.from_arrays(arrays, names=('number', 'color'))
323+
MultiIndex(levels=[[1, 2], ['blue', 'red']],
324+
labels=[[0, 0, 1, 1], [1, 0, 1, 0]],
325+
names=['number', 'color'])
316326
"""
317327
if not is_list_like(arrays):
318328
raise TypeError("Input must be a list / sequence of array-likes.")
@@ -337,31 +347,37 @@ def from_arrays(cls, arrays, sortorder=None, names=None):
337347
@classmethod
338348
def from_tuples(cls, tuples, sortorder=None, names=None):
339349
"""
340-
Convert list of tuples to MultiIndex
350+
Convert list of tuples to MultiIndex.
341351
342352
Parameters
343353
----------
344354
tuples : list / sequence of tuple-likes
345355
Each tuple is the index of one row/column.
346356
sortorder : int or None
347357
Level of sortedness (must be lexicographically sorted by that
348-
level)
358+
level).
359+
names : list / sequence of str, optional
360+
Names for the levels in the index.
349361
350362
Returns
351363
-------
352364
index : MultiIndex
353365
354366
See Also
355367
--------
356-
MultiIndex.from_arrays : Convert list of arrays to MultiIndex
368+
MultiIndex.from_arrays : Convert list of arrays to MultiIndex.
357369
MultiIndex.from_product : Make a MultiIndex from cartesian product
358-
of iterables
370+
of iterables.
371+
MultiIndex.from_frame : Make a MultiIndex from a DataFrame.
359372
360373
Examples
361374
--------
362375
>>> tuples = [(1, u'red'), (1, u'blue'),
363-
(2, u'red'), (2, u'blue')]
376+
... (2, u'red'), (2, u'blue')]
364377
>>> pd.MultiIndex.from_tuples(tuples, names=('number', 'color'))
378+
MultiIndex(levels=[[1, 2], ['blue', 'red']],
379+
labels=[[0, 0, 1, 1], [1, 0, 1, 0]],
380+
names=['number', 'color'])
365381
"""
366382
if not is_list_like(tuples):
367383
raise TypeError('Input must be a list / sequence of tuple-likes.')
@@ -388,7 +404,7 @@ def from_tuples(cls, tuples, sortorder=None, names=None):
388404
@classmethod
389405
def from_product(cls, iterables, sortorder=None, names=None):
390406
"""
391-
Make a MultiIndex from the cartesian product of multiple iterables
407+
Make a MultiIndex from the cartesian product of multiple iterables.
392408
393409
Parameters
394410
----------
@@ -397,7 +413,7 @@ def from_product(cls, iterables, sortorder=None, names=None):
397413
sortorder : int or None
398414
Level of sortedness (must be lexicographically sorted by that
399415
level).
400-
names : list / sequence of strings or None
416+
names : list / sequence of str, optional
401417
Names for the levels in the index.
402418
403419
Returns
@@ -408,16 +424,17 @@ def from_product(cls, iterables, sortorder=None, names=None):
408424
--------
409425
MultiIndex.from_arrays : Convert list of arrays to MultiIndex.
410426
MultiIndex.from_tuples : Convert list of tuples to MultiIndex.
427+
MultiIndex.from_frame : Make a MultiIndex from a DataFrame.
411428
412429
Examples
413430
--------
414431
>>> numbers = [0, 1, 2]
415-
>>> colors = [u'green', u'purple']
432+
>>> colors = ['green', 'purple']
416433
>>> pd.MultiIndex.from_product([numbers, colors],
417-
names=['number', 'color'])
418-
MultiIndex(levels=[[0, 1, 2], [u'green', u'purple']],
434+
... names=['number', 'color'])
435+
MultiIndex(levels=[[0, 1, 2], ['green', 'purple']],
419436
labels=[[0, 0, 1, 1, 2, 2], [0, 1, 0, 1, 0, 1]],
420-
names=[u'number', u'color'])
437+
names=['number', 'color'])
421438
"""
422439
from pandas.core.arrays.categorical import _factorize_from_iterables
423440
from pandas.core.reshape.util import cartesian_product
@@ -431,6 +448,68 @@ def from_product(cls, iterables, sortorder=None, names=None):
431448
codes = cartesian_product(codes)
432449
return MultiIndex(levels, codes, sortorder=sortorder, names=names)
433450

451+
@classmethod
452+
def from_frame(cls, df, sortorder=None, names=None):
453+
"""
454+
Make a MultiIndex from a DataFrame.
455+
456+
.. versionadded:: 0.24.0
457+
458+
Parameters
459+
----------
460+
df : DataFrame
461+
DataFrame to be converted to MultiIndex.
462+
sortorder : int, optional
463+
Level of sortedness (must be lexicographically sorted by that
464+
level).
465+
names : list-like, optional
466+
If no names are provided, use the column names, or tuple of column
467+
names if the columns is a MultiIndex. If a sequence, overwrite
468+
names with the given sequence.
469+
470+
Returns
471+
-------
472+
MultiIndex
473+
The MultiIndex representation of the given DataFrame.
474+
475+
See Also
476+
--------
477+
MultiIndex.from_arrays : Convert list of arrays to MultiIndex.
478+
MultiIndex.from_tuples : Convert list of tuples to MultiIndex.
479+
MultiIndex.from_product : Make a MultiIndex from cartesian product
480+
of iterables.
481+
482+
Examples
483+
--------
484+
>>> df = pd.DataFrame([['HI', 'Temp'], ['HI', 'Precip'],
485+
... ['NJ', 'Temp'], ['NJ', 'Precip']],
486+
... columns=['a', 'b'])
487+
>>> df
488+
a b
489+
0 HI Temp
490+
1 HI Precip
491+
2 NJ Temp
492+
3 NJ Precip
493+
494+
>>> pd.MultiIndex.from_frame(df)
495+
MultiIndex(levels=[['HI', 'NJ'], ['Precip', 'Temp']],
496+
labels=[[0, 0, 1, 1], [1, 0, 1, 0]],
497+
names=['a', 'b'])
498+
499+
Using explicit names, instead of the column names
500+
501+
>>> pd.MultiIndex.from_frame(df, names=['state', 'observation'])
502+
MultiIndex(levels=[['HI', 'NJ'], ['Precip', 'Temp']],
503+
labels=[[0, 0, 1, 1], [1, 0, 1, 0]],
504+
names=['state', 'observation'])
505+
"""
506+
if not isinstance(df, ABCDataFrame):
507+
raise TypeError("Input must be a DataFrame")
508+
509+
column_names, columns = lzip(*df.iteritems())
510+
names = column_names if names is None else names
511+
return cls.from_arrays(columns, sortorder=sortorder, names=names)
512+
434513
# --------------------------------------------------------------------
435514

436515
@property
@@ -1386,11 +1465,16 @@ def to_frame(self, index=True, name=None):
13861465
else:
13871466
idx_names = self.names
13881467

1389-
result = DataFrame({(name or level):
1390-
self._get_level_values(level)
1391-
for name, level in
1392-
zip(idx_names, range(len(self.levels)))},
1393-
copy=False)
1468+
# Guarantee resulting column order
1469+
result = DataFrame(
1470+
OrderedDict([
1471+
((level if name is None else name),
1472+
self._get_level_values(level))
1473+
for name, level in zip(idx_names, range(len(self.levels)))
1474+
]),
1475+
copy=False
1476+
)
1477+
13941478
if index:
13951479
result.index = self
13961480
return result

0 commit comments

Comments
 (0)