Skip to content

ENH: MultiIndex.from_frame #23141

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 52 commits into from
Dec 9, 2018
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
52 commits
Select commit Hold shift + click to select a range
79bdecb
ENH - add from_frame method and accompanying squeeze method to multii…
sds9995 Oct 13, 2018
fa82618
ENH - guarentee that order of labels is preserved in multiindex to_fr…
sds9995 Oct 13, 2018
64b45d6
CLN - adhere to PEP8 line length
sds9995 Oct 13, 2018
64c7bb1
CLN - remove trailing whitespace
sds9995 Oct 13, 2018
3ee676c
ENH - raise TypeError on inappropriate input
sds9995 Oct 16, 2018
fd266f5
TST - add tests for mi.from_frame and mi.squeeze
sds9995 Oct 16, 2018
4bc8f5b
CLN - pep8 adherence in tests
sds9995 Oct 16, 2018
9d92b70
CLN - last missed pep8 fix
sds9995 Oct 16, 2018
45595ad
BUG - remove pd.DataFrame in favor of local import
ms7463 Oct 16, 2018
3530cd3
DOC - add more detailed docstrings for from_frame and squeeze
sds9995 Oct 18, 2018
1c22791
DOC - update MultiIndex.from_frame and squeeze doctests to comply wit…
sds9995 Oct 28, 2018
cf78780
CLN - cleanup docstrings and source
sds9995 Oct 28, 2018
64c2750
TST - reorganize some of the multiindex tests
sds9995 Oct 28, 2018
ede030b
CLN - adhere to pep8 line length
sds9995 Oct 28, 2018
190c341
BUG - ensure dtypes are preserved in from_frame and to_frame
sds9995 Nov 3, 2018
e0df632
TST - add tests for ensuring dtype fidelity and custom names for from…
sds9995 Nov 3, 2018
78ff5c2
CLN - pep8 adherence
sds9995 Nov 3, 2018
0252db9
DOC - add examples and change order of kwargs for from_frame
sds9995 Nov 3, 2018
d98c8a9
TST - parameterize tests
sds9995 Nov 3, 2018
8a1906e
CLN - pep8 adherence
sds9995 Nov 3, 2018
08c120f
CLN - pep8 adherence
sds9995 Nov 3, 2018
8353c3f
DOC/CLN - add versionadded tags, add to whatsnew page, and clean up i…
sds9995 Nov 4, 2018
9df3c11
CLN - squeeze -> _squeeze
sds9995 Nov 10, 2018
6d4915e
DOC - squeeze -> _squeeze in whatsnew
ms7463 Nov 10, 2018
b5df7b2
BUG - allow repeat column names in from_frame, and falsey column name…
sds9995 Nov 11, 2018
ab3259c
DOC - whatsnew formatting
sds9995 Nov 11, 2018
cf95261
TST - reorganize and add tests for more incompatible from_frame types
sds9995 Nov 11, 2018
63051d7
Merge branch 'enhancement/from_frame' of https://github.com/ArtinSarr…
sds9995 Nov 11, 2018
a75a4a5
CLN - remove squeeze tests
sds9995 Nov 12, 2018
8d23df9
CLN - remove squeeze parameter from from_frame
sds9995 Nov 12, 2018
c8d696d
Merge branch 'master' into enhancement/from_frame
sds9995 Nov 12, 2018
7cf82d1
TST - remove callable name option
sds9995 Nov 12, 2018
1a282e5
ENH - from_data initial commit
sds9995 Nov 14, 2018
b3c6a90
DOC - reduce whatsnew entry for to_frame
sds9995 Nov 19, 2018
c760359
CLN/DOC - add examples to from_frame docstring and make code more rea…
sds9995 Nov 19, 2018
bb69314
Merge branch 'master' into enhancement/from_frame
sds9995 Nov 19, 2018
9e11180
TST - use OrderedDict for dataframe construction
sds9995 Nov 20, 2018
96c6af3
Merge branch 'master' into enhancement/from_frame
sds9995 Nov 28, 2018
a5236bf
CLN - clean up code and use pytest.raises
sds9995 Dec 1, 2018
c78f364
Merge branch 'master' into enhancement/from_frame
sds9995 Dec 1, 2018
14bfea8
DOC - move to_frame breaking changes to backwards incompatible sectio…
sds9995 Dec 2, 2018
6960804
Merge branch 'master' into enhancement/from_frame
ms7463 Dec 2, 2018
11c5947
Merge branch 'master' into enhancement/from_frame
sds9995 Dec 4, 2018
904644a
Merge branch 'enhancement/from_frame' of https://github.com/ArtinSarr…
sds9995 Dec 4, 2018
30fe0df
DOC - add advanced.rst section
sds9995 Dec 5, 2018
ec60563
Merge branch 'master' into enhancement/from_frame
sds9995 Dec 5, 2018
8fc6609
Merge branch 'master' into enhancement/from_frame
sds9995 Dec 6, 2018
9b906c6
DOC/CLN - cleanup documentation
sds9995 Dec 6, 2018
e416122
CLN - fix linting error according to pandas-dev.pandas test
sds9995 Dec 6, 2018
4ef9ec4
DOC - fix docstrings
sds9995 Dec 7, 2018
4240a1e
CLN - fix import order with isort
sds9995 Dec 7, 2018
9159b2d
Merge branch 'master' into enhancement/from_frame
sds9995 Dec 7, 2018
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 16 additions & 2 deletions doc/source/advanced.rst
Original file line number Diff line number Diff line change
Expand Up @@ -62,8 +62,9 @@ The :class:`MultiIndex` object is the hierarchical analogue of the standard
can think of ``MultiIndex`` as an array of tuples where each tuple is unique. A
``MultiIndex`` can be created from a list of arrays (using
:meth:`MultiIndex.from_arrays`), an array of tuples (using
:meth:`MultiIndex.from_tuples`), or a crossed set of iterables (using
:meth:`MultiIndex.from_product`). The ``Index`` constructor will attempt to return
:meth:`MultiIndex.from_tuples`), a crossed set of iterables (using
:meth:`MultiIndex.from_product`), or a :class:`DataFrame` (using
:meth:`MultiIndex.from_frame`). The ``Index`` constructor will attempt to return
a ``MultiIndex`` when it is passed a list of tuples. The following examples
demonstrate different ways to initialize MultiIndexes.

Expand All @@ -89,6 +90,19 @@ to use the :meth:`MultiIndex.from_product` method:
iterables = [['bar', 'baz', 'foo', 'qux'], ['one', 'two']]
pd.MultiIndex.from_product(iterables, names=['first', 'second'])

You can also construct a ``MultiIndex`` from a ``DataFrame`` directly, using
the method :meth:`MultiIndex.from_frame`. This is a complementary method to
:meth:`MultiIndex.to_frame`.

.. versionadded:: 0.24.0

.. ipython:: python

df = pd.DataFrame([['bar', 'one'], ['bar', 'two'],
['foo', 'one'], ['foo', 'two']],
columns=['first', 'second'])
pd.MultiIndex.from_frame(df)

As a convenience, you can pass a list of arrays directly into ``Series`` or
``DataFrame`` to construct a ``MultiIndex`` automatically:

Expand Down
1 change: 1 addition & 0 deletions doc/source/api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1703,6 +1703,7 @@ MultiIndex Constructors
MultiIndex.from_arrays
MultiIndex.from_tuples
MultiIndex.from_product
MultiIndex.from_frame

MultiIndex Attributes
~~~~~~~~~~~~~~~~~~~~~
Expand Down
2 changes: 2 additions & 0 deletions doc/source/whatsnew/v0.24.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -378,6 +378,7 @@ Backwards incompatible API changes
- Passing scalar values to :class:`DatetimeIndex` or :class:`TimedeltaIndex` will now raise ``TypeError`` instead of ``ValueError`` (:issue:`23539`)
- ``max_rows`` and ``max_cols`` parameters removed from :class:`HTMLFormatter` since truncation is handled by :class:`DataFrameFormatter` (:issue:`23818`)
- :meth:`read_csv` will now raise a ``ValueError`` if a column with missing values is declared as having dtype ``bool`` (:issue:`20591`)
- The column order of the resultant :class:`DataFrame` from :meth:`MultiIndex.to_frame` is now guaranteed to match the :attr:`MultiIndex.names` order. (:issue:`22420`)

.. _whatsnew_0240.api_breaking.deps:

Expand Down Expand Up @@ -1433,6 +1434,7 @@ MultiIndex

- Removed compatibility for :class:`MultiIndex` pickles prior to version 0.8.0; compatibility with :class:`MultiIndex` pickles from version 0.13 forward is maintained (:issue:`21654`)
- :meth:`MultiIndex.get_loc_level` (and as a consequence, ``.loc`` on a ``Series`` or ``DataFrame`` with a :class:`MultiIndex` index) will now raise a ``KeyError``, rather than returning an empty ``slice``, if asked a label which is present in the ``levels`` but is unused (:issue:`22221`)
- :cls:`MultiIndex` has gained the :meth:`MultiIndex.from_frame`, it allows constructing a :cls:`MultiIndex` object from a :cls:`DataFrame` (:issue:`22420`)
- Fix ``TypeError`` in Python 3 when creating :class:`MultiIndex` in which some levels have mixed types, e.g. when some labels are tuples (:issue:`15457`)

I/O
Expand Down
148 changes: 116 additions & 32 deletions pandas/core/indexes/multi.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
# pylint: disable=E1101,E1103,W0232
from collections import OrderedDict
import datetime
from sys import getsizeof
import warnings
Expand All @@ -18,6 +19,7 @@
is_integer, is_iterator, is_list_like, is_object_dtype, is_scalar,
pandas_dtype)
from pandas.core.dtypes.dtypes import ExtensionDtype, PandasExtensionDtype
from pandas.core.dtypes.generic import ABCDataFrame
from pandas.core.dtypes.missing import array_equivalent, isna

import pandas.core.algorithms as algos
Expand Down Expand Up @@ -125,25 +127,25 @@ class MultiIndex(Index):
Parameters
----------
levels : sequence of arrays
The unique labels for each level
The unique labels for each level.
codes : sequence of arrays
Integers for each level designating which label at each location
Integers for each level designating which label at each location.

.. versionadded:: 0.24.0
labels : sequence of arrays
Integers for each level designating which label at each location
Integers for each level designating which label at each location.

.. deprecated:: 0.24.0
Use ``codes`` instead
sortorder : optional int
Level of sortedness (must be lexicographically sorted by that
level)
level).
names : optional sequence of objects
Names for each of the index levels. (name is accepted for compat)
copy : boolean, default False
Copy the meta-data
verify_integrity : boolean, default True
Check that the levels/codes are consistent and valid
Names for each of the index levels. (name is accepted for compat).
copy : bool, default False
Copy the meta-data.
verify_integrity : bool, default True
Check that the levels/codes are consistent and valid.

Attributes
----------
Expand All @@ -158,6 +160,7 @@ class MultiIndex(Index):
from_arrays
from_tuples
from_product
from_frame
set_levels
set_codes
to_frame
Expand All @@ -175,13 +178,9 @@ class MultiIndex(Index):
MultiIndex.from_product : Create a MultiIndex from the cartesian product
of iterables.
MultiIndex.from_tuples : Convert list of tuples to a MultiIndex.
MultiIndex.from_frame : Make a MultiIndex from a DataFrame.
Index : The base pandas Index type.

Notes
-----
See the `user guide
<http://pandas.pydata.org/pandas-docs/stable/advanced.html>`_ for more.

Examples
---------
A new ``MultiIndex`` is typically constructed using one of the helper
Expand All @@ -196,6 +195,11 @@ class MultiIndex(Index):

See further examples for how to construct a MultiIndex in the doc strings
of the mentioned helper methods.

Notes
-----
See the `user guide
<http://pandas.pydata.org/pandas-docs/stable/advanced.html>`_ for more.
"""

# initialize to zero-length tuples to make everything work
Expand Down Expand Up @@ -288,7 +292,7 @@ def _verify_integrity(self, codes=None, levels=None):
@classmethod
def from_arrays(cls, arrays, sortorder=None, names=None):
"""
Convert arrays to MultiIndex
Convert arrays to MultiIndex.

Parameters
----------
Expand All @@ -297,7 +301,9 @@ def from_arrays(cls, arrays, sortorder=None, names=None):
len(arrays) is the number of levels.
sortorder : int or None
Level of sortedness (must be lexicographically sorted by that
level)
level).
names : list / sequence of str, optional
Names for the levels in the index.

Returns
-------
Expand All @@ -308,11 +314,15 @@ def from_arrays(cls, arrays, sortorder=None, names=None):
MultiIndex.from_tuples : Convert list of tuples to MultiIndex.
MultiIndex.from_product : Make a MultiIndex from cartesian product
of iterables.
MultiIndex.from_frame : Make a MultiIndex from a DataFrame.

Examples
--------
>>> arrays = [[1, 1, 2, 2], ['red', 'blue', 'red', 'blue']]
>>> pd.MultiIndex.from_arrays(arrays, names=('number', 'color'))
MultiIndex(levels=[[1, 2], ['blue', 'red']],
labels=[[0, 0, 1, 1], [1, 0, 1, 0]],
names=['number', 'color'])
"""
if not is_list_like(arrays):
raise TypeError("Input must be a list / sequence of array-likes.")
Expand All @@ -337,31 +347,37 @@ def from_arrays(cls, arrays, sortorder=None, names=None):
@classmethod
def from_tuples(cls, tuples, sortorder=None, names=None):
"""
Convert list of tuples to MultiIndex
Convert list of tuples to MultiIndex.

Parameters
----------
tuples : list / sequence of tuple-likes
Each tuple is the index of one row/column.
sortorder : int or None
Level of sortedness (must be lexicographically sorted by that
level)
level).
names : list / sequence of str, optional
Names for the levels in the index.

Returns
-------
index : MultiIndex

See Also
--------
MultiIndex.from_arrays : Convert list of arrays to MultiIndex
MultiIndex.from_arrays : Convert list of arrays to MultiIndex.
MultiIndex.from_product : Make a MultiIndex from cartesian product
of iterables
of iterables.
MultiIndex.from_frame : Make a MultiIndex from a DataFrame.

Examples
--------
>>> tuples = [(1, u'red'), (1, u'blue'),
(2, u'red'), (2, u'blue')]
... (2, u'red'), (2, u'blue')]
>>> pd.MultiIndex.from_tuples(tuples, names=('number', 'color'))
MultiIndex(levels=[[1, 2], ['blue', 'red']],
labels=[[0, 0, 1, 1], [1, 0, 1, 0]],
names=['number', 'color'])
"""
if not is_list_like(tuples):
raise TypeError('Input must be a list / sequence of tuple-likes.')
Expand All @@ -388,7 +404,7 @@ def from_tuples(cls, tuples, sortorder=None, names=None):
@classmethod
def from_product(cls, iterables, sortorder=None, names=None):
"""
Make a MultiIndex from the cartesian product of multiple iterables
Make a MultiIndex from the cartesian product of multiple iterables.

Parameters
----------
Expand All @@ -397,7 +413,7 @@ def from_product(cls, iterables, sortorder=None, names=None):
sortorder : int or None
Level of sortedness (must be lexicographically sorted by that
level).
names : list / sequence of strings or None
names : list / sequence of str, optional
Names for the levels in the index.

Returns
Expand All @@ -408,16 +424,17 @@ def from_product(cls, iterables, sortorder=None, names=None):
--------
MultiIndex.from_arrays : Convert list of arrays to MultiIndex.
MultiIndex.from_tuples : Convert list of tuples to MultiIndex.
MultiIndex.from_frame : Make a MultiIndex from a DataFrame.

Examples
--------
>>> numbers = [0, 1, 2]
>>> colors = [u'green', u'purple']
>>> colors = ['green', 'purple']
>>> pd.MultiIndex.from_product([numbers, colors],
names=['number', 'color'])
MultiIndex(levels=[[0, 1, 2], [u'green', u'purple']],
... names=['number', 'color'])
MultiIndex(levels=[[0, 1, 2], ['green', 'purple']],
labels=[[0, 0, 1, 1, 2, 2], [0, 1, 0, 1, 0, 1]],
names=[u'number', u'color'])
names=['number', 'color'])
"""
from pandas.core.arrays.categorical import _factorize_from_iterables
from pandas.core.reshape.util import cartesian_product
Expand All @@ -431,6 +448,68 @@ def from_product(cls, iterables, sortorder=None, names=None):
codes = cartesian_product(codes)
return MultiIndex(levels, codes, sortorder=sortorder, names=names)

@classmethod
def from_frame(cls, df, sortorder=None, names=None):
"""
Make a MultiIndex from a DataFrame.

.. versionadded:: 0.24.0

Parameters
----------
df : DataFrame
DataFrame to be converted to MultiIndex.
sortorder : int, optional
Level of sortedness (must be lexicographically sorted by that
level).
names : list-like, optional
If no names are provided, use the column names, or tuple of column
names if the columns is a MultiIndex. If a sequence, overwrite
names with the given sequence.

Returns
-------
MultiIndex
The MultiIndex representation of the given DataFrame.

See Also
--------
MultiIndex.from_arrays : Convert list of arrays to MultiIndex.
MultiIndex.from_tuples : Convert list of tuples to MultiIndex.
MultiIndex.from_product : Make a MultiIndex from cartesian product
of iterables.

Examples
--------
>>> df = pd.DataFrame([['HI', 'Temp'], ['HI', 'Precip'],
... ['NJ', 'Temp'], ['NJ', 'Precip']],
... columns=['a', 'b'])
>>> df
a b
0 HI Temp
1 HI Precip
2 NJ Temp
3 NJ Precip

>>> pd.MultiIndex.from_frame(df)
MultiIndex(levels=[['HI', 'NJ'], ['Precip', 'Temp']],
labels=[[0, 0, 1, 1], [1, 0, 1, 0]],
names=['a', 'b'])

Using explicit names, instead of the column names

>>> pd.MultiIndex.from_frame(df, names=['state', 'observation'])
MultiIndex(levels=[['HI', 'NJ'], ['Precip', 'Temp']],
labels=[[0, 0, 1, 1], [1, 0, 1, 0]],
names=['state', 'observation'])
"""
if not isinstance(df, ABCDataFrame):
raise TypeError("Input must be a DataFrame")

column_names, columns = lzip(*df.iteritems())
names = column_names if names is None else names
return cls.from_arrays(columns, sortorder=sortorder, names=names)

# --------------------------------------------------------------------

@property
Expand Down Expand Up @@ -1386,11 +1465,16 @@ def to_frame(self, index=True, name=None):
else:
idx_names = self.names

result = DataFrame({(name or level):
self._get_level_values(level)
for name, level in
zip(idx_names, range(len(self.levels)))},
copy=False)
# Guarantee resulting column order
result = DataFrame(
OrderedDict([
((level if name is None else name),
self._get_level_values(level))
for name, level in zip(idx_names, range(len(self.levels)))
]),
copy=False
)

if index:
result.index = self
return result
Expand Down
Loading