Skip to content

Commit e18288e

Browse files
chris-b1Nick Eubank
authored and
Nick Eubank
committed
ENH: read_excel MultiIndex pandas-dev#4679
1 parent 3f766c3 commit e18288e

12 files changed

+339
-141
lines changed

doc/source/io.rst

+40
Original file line numberDiff line numberDiff line change
@@ -1989,6 +1989,46 @@ advanced strategies
19891989
Reading Excel Files
19901990
'''''''''''''''''''
19911991

1992+
.. versionadded:: 0.17
1993+
1994+
``read_excel`` can read a ``MultiIndex`` index, by passing a list of columns to ``index_col``
1995+
and a ``MultiIndex`` column by passing a list of rows to ``header``. If either the ``index``
1996+
or ``columns`` have serialized level names those will be read in as well by specifying
1997+
the rows/columns that make up the levels.
1998+
1999+
.. ipython:: python
2000+
2001+
# MultiIndex index - no names
2002+
df = pd.DataFrame({'a':[1,2,3,4], 'b':[5,6,7,8]},
2003+
index=pd.MultiIndex.from_product([['a','b'],['c','d']]))
2004+
df.to_excel('path_to_file.xlsx')
2005+
df = pd.read_excel('path_to_file.xlsx', index_col=[0,1])
2006+
df
2007+
2008+
# MultiIndex index - with names
2009+
df.index = df.index.set_names(['lvl1', 'lvl2'])
2010+
df.to_excel('path_to_file.xlsx')
2011+
df = pd.read_excel('path_to_file.xlsx', index_col=[0,1])
2012+
df
2013+
2014+
# MultiIndex index and column - with names
2015+
df.columns = pd.MultiIndex.from_product([['a'],['b', 'd']], names=['c1', 'c2'])
2016+
df.to_excel('path_to_file.xlsx')
2017+
df = pd.read_excel('path_to_file.xlsx',
2018+
index_col=[0,1], header=[0,1])
2019+
df
2020+
2021+
.. ipython:: python
2022+
:suppress:
2023+
2024+
import os
2025+
os.remove('path_to_file.xlsx')
2026+
2027+
.. warning::
2028+
2029+
Excel files saved in version 0.16.2 or prior that had index names will still able to be read in,
2030+
but the ``has_index_names`` argument must specified to ``True``.
2031+
19922032
.. versionadded:: 0.16
19932033

19942034
``read_excel`` can read more than one sheet, by setting ``sheetname`` to either

doc/source/whatsnew/v0.17.0.txt

+47-2
Original file line numberDiff line numberDiff line change
@@ -205,6 +205,53 @@ The support math functions are `sin`, `cos`, `exp`, `log`, `expm1`, `log1p`,
205205
These functions map to the intrinsics for the NumExpr engine. For Python
206206
engine, they are mapped to NumPy calls.
207207

208+
Changes to Excel with ``MultiIndex``
209+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
210+
In version 0.16.2 a ``DataFrame`` with ``MultiIndex`` columns could not be written to Excel via ``to_excel``.
211+
That functionality has been added (:issue:`10564`), along with updating ``read_excel`` so that the data can
212+
be read back with no loss of information by specifying which columns/rows make up the ``MultiIndex``
213+
in the ``header`` and ``index_col`` parameters (:issue:`4679`)
214+
215+
See the :ref:`documentation <io.excel>` for more details.
216+
217+
.. ipython:: python
218+
219+
df = pd.DataFrame([[1,2,3,4], [5,6,7,8]],
220+
columns = pd.MultiIndex.from_product([['foo','bar'],['a','b']],
221+
names = ['col1', 'col2']),
222+
index = pd.MultiIndex.from_product([['j'], ['l', 'k']],
223+
names = ['i1', 'i2']))
224+
225+
df
226+
df.to_excel('test.xlsx')
227+
228+
df = pd.read_excel('test.xlsx', header=[0,1], index_col=[0,1])
229+
df
230+
231+
.. ipython:: python
232+
:suppress:
233+
234+
import os
235+
os.remove('test.xlsx')
236+
237+
Previously, it was necessary to specify the ``has_index_names`` argument in ``read_excel``
238+
if the serialized data had index names. For version 0.17 the ouptput format of ``to_excel``
239+
has been changed to make this keyword unnecessary - the change is shown below.
240+
241+
**Old**
242+
243+
.. image:: _static/old-excel-index.png
244+
245+
**New**
246+
247+
.. image:: _static/new-excel-index.png
248+
249+
.. warning::
250+
251+
Excel files saved in version 0.16.2 or prior that had index names will still able to be read in,
252+
but the ``has_index_names`` argument must specified to ``True``.
253+
254+
208255
.. _whatsnew_0170.enhancements.other:
209256

210257
Other enhancements
@@ -764,7 +811,6 @@ Changes to ``Categorical.unique``
764811
cat
765812
cat.unique()
766813

767-
768814
.. _whatsnew_0170.api_breaking.other:
769815

770816
Other API Changes
@@ -774,7 +820,6 @@ Other API Changes
774820
- Calling the ``.value_counts`` method on a Series with ``categorical`` dtype now returns a Series with a ``CategoricalIndex`` (:issue:`10704`)
775821
- Allow passing `kwargs` to the interpolation methods (:issue:`10378`).
776822
- The metadata properties of subclasses of pandas objects will now be serialized (:issue:`10553`).
777-
- Allow ``DataFrame`` with ``MultiIndex`` columns to be written to Excel (:issue:`10564`). This was changed in 0.16.2 as the read-back method could not always guarantee perfect fidelity (:issue:`9794`).
778823
- ``groupby`` using ``Categorical`` follows the same rule as ``Categorical.unique`` described above (:issue:`10508`)
779824
- Improved error message when concatenating an empty iterable of dataframes (:issue:`9157`)
780825
- When constructing ``DataFrame`` with an array of ``complex64`` dtype that meant the corresponding column was automatically promoted to the ``complex128`` dtype. Pandas will now preserve the itemsize of the input for complex data (:issue:`10952`)

pandas/core/format.py

+15-26
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,6 @@
44
# pylint: disable=W0141
55

66
import sys
7-
import warnings
87

98
from pandas.core.base import PandasObject
109
from pandas.core.common import adjoin, notnull
@@ -1641,14 +1640,11 @@ class ExcelFormatter(object):
16411640
inf_rep : string, default `'inf'`
16421641
representation for np.inf values (which aren't representable in Excel)
16431642
A `'-'` sign will be added in front of -inf.
1644-
verbose: boolean, default True
1645-
If True, warn user that the resulting output file may not be
1646-
re-read or parsed directly by pandas.
16471643
"""
16481644

16491645
def __init__(self, df, na_rep='', float_format=None, cols=None,
16501646
header=True, index=True, index_label=None, merge_cells=False,
1651-
inf_rep='inf', verbose=True):
1647+
inf_rep='inf'):
16521648
self.df = df
16531649
self.rowcounter = 0
16541650
self.na_rep = na_rep
@@ -1661,7 +1657,6 @@ def __init__(self, df, na_rep='', float_format=None, cols=None,
16611657
self.header = header
16621658
self.merge_cells = merge_cells
16631659
self.inf_rep = inf_rep
1664-
self.verbose = verbose
16651660

16661661
def _format_value(self, val):
16671662
if lib.checknull(val):
@@ -1682,10 +1677,6 @@ def _format_header_mi(self):
16821677
raise NotImplementedError("Writing to Excel with MultiIndex"
16831678
" columns and no index ('index'=False) "
16841679
"is not yet implemented.")
1685-
elif self.index and self.verbose:
1686-
warnings.warn("Writing to Excel with MultiIndex columns is a"
1687-
" one way serializable operation. You will not"
1688-
" be able to re-read or parse the output file.")
16891680

16901681
has_aliases = isinstance(self.header, (tuple, list, np.ndarray, Index))
16911682
if not(has_aliases or self.header):
@@ -1796,18 +1787,14 @@ def _format_regular_rows(self):
17961787
else:
17971788
index_label = self.df.index.names[0]
17981789

1790+
if isinstance(self.columns, MultiIndex):
1791+
self.rowcounter += 1
1792+
17991793
if index_label and self.header is not False:
1800-
if self.merge_cells:
1801-
yield ExcelCell(self.rowcounter,
1802-
0,
1803-
index_label,
1804-
header_style)
1805-
self.rowcounter += 1
1806-
else:
1807-
yield ExcelCell(self.rowcounter - 1,
1808-
0,
1809-
index_label,
1810-
header_style)
1794+
yield ExcelCell(self.rowcounter - 1,
1795+
0,
1796+
index_label,
1797+
header_style)
18111798

18121799
# write index_values
18131800
index_values = self.df.index
@@ -1841,19 +1828,21 @@ def _format_hierarchical_rows(self):
18411828
(list, tuple, np.ndarray, Index)):
18421829
index_labels = self.index_label
18431830

1831+
# MultiIndex columns require an extra row
1832+
# with index names (blank if None) for
1833+
# unambigous round-trip
1834+
if isinstance(self.columns, MultiIndex):
1835+
self.rowcounter += 1
1836+
18441837
# if index labels are not empty go ahead and dump
18451838
if (any(x is not None for x in index_labels)
18461839
and self.header is not False):
18471840

1848-
if not self.merge_cells:
1849-
self.rowcounter -= 1
1850-
18511841
for cidx, name in enumerate(index_labels):
1852-
yield ExcelCell(self.rowcounter,
1842+
yield ExcelCell(self.rowcounter - 1,
18531843
cidx,
18541844
name,
18551845
header_style)
1856-
self.rowcounter += 1
18571846

18581847
if self.merge_cells:
18591848
# Format hierarchical rows as merged cells.

pandas/core/frame.py

+1-4
Original file line numberDiff line numberDiff line change
@@ -1336,9 +1336,6 @@ def to_excel(self, excel_writer, sheet_name='Sheet1', na_rep='',
13361336
inf_rep : string, default 'inf'
13371337
Representation for infinity (there is no native representation for
13381338
infinity in Excel)
1339-
verbose: boolean, default True
1340-
If True, warn user that the resulting output file may not be
1341-
re-read or parsed directly by pandas.
13421339
13431340
Notes
13441341
-----
@@ -1371,7 +1368,7 @@ def to_excel(self, excel_writer, sheet_name='Sheet1', na_rep='',
13711368
index=index,
13721369
index_label=index_label,
13731370
merge_cells=merge_cells,
1374-
inf_rep=inf_rep, verbose=verbose)
1371+
inf_rep=inf_rep)
13751372
formatted_cells = formatter.get_formatted_cells()
13761373
excel_writer.write_cells(formatted_cells, sheet_name,
13771374
startrow=startrow, startcol=startcol)

0 commit comments

Comments
 (0)