BUG: to_excel to throw IndexError for dataframe with empty (row) MultiIndex #19543

lauhayden · 2018-02-05T22:59:24Z

mi = pd.MultiIndex.from_arrays([
        np.array(['a', 'a', 'b', 'b']), 
        np.array(['1', '2', '2', '3']), 
        np.array(['alpha', 'beta', 'alpha', 'beta']),
    ], names=['one', 'two', 'three'])
df = pd.DataFrame(np.random.rand(4, 3), index=mi)
df2 = df.loc[('b', '1', slice(None)), :]
print(df2.index)

Output:

MultiIndex(levels=[['a', 'b'], ['1', '2', '3'], ['alpha', 'beta']],
           labels=[[], [], []],
           names=['one', 'two', 'three'])

The next line causes an IndexError:

df2.to_excel('test.xlsx')

Notice that the 'empty' MultiIndex does not give any indication of its presence when the entire dataframe is printed, looking like an ordinary empty dataframe instead.

Empty DataFrame
Columns: [0, 1, 2]
Index: []

Problem description

I'm not quite sure which behaviour is the problematic one here: the fact that the first block of code creates an 'empty' MultiIndex that has values in levels, or that to_excel doesn't write a DataFrame with such an index to file properly. The 'empty' MultiIndex behaviour is a little bit anomalous by itself, because a similar block of code (below) will cause a KeyError. The KeyError is 'suppressed' by the fact that there is a third level in the MultiIndex.

mi = pd.MultiIndex.from_arrays([
        np.array(['a', 'a', 'b', 'b']), 
        np.array(['1', '2', '2', '3']), 
    ], names=['one', 'two'])
df = pd.DataFrame(np.random.rand(4, 3), index=mi)
df2 = df.loc[('b', '1'), :]

I initially ran into this behaviour (the IndexError) when using xs, but since xs seems to be headed for the deprecation guillotine, I figured I'd rewrite this with loc instead.

Output of `pd.show_versions()`

INSTALLED VERSIONS

commit: None
python: 3.6.1.final.0
python-bits: 64
OS: Linux
OS-release: 4.4.0-43-Microsoft
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.22.0
pytest: None
pip: 9.0.1
setuptools: 27.2.0
Cython: None
numpy: 1.14.0
scipy: 1.0.0
pyarrow: None
xarray: None
IPython: 6.2.1
sphinx: None
patsy: None
dateutil: 2.6.1
pytz: 2017.3
blosc: None
bottleneck: None
tables: 3.4.2
numexpr: 2.6.4
feather: None
matplotlib: None
openpyxl: 2.4.9
xlrd: 1.1.0
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: 1.0b10
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

The text was updated successfully, but these errors were encountered:

jorisvandenbossche · 2018-02-06T00:04:42Z

I'm not quite sure which behaviour is the problematic one here: the fact that the first block of code creates an 'empty' MultiIndex that has values in levels,

This is normal and expected behaviour. When taking a subset of a MultiIndex, the levels are not re-computed but kept intact (the fact that it is empty is expressed in its empty labels).

or that to_excel doesn't write a DataFrame with such an index to file properly.

This could be considered a bug, given that for an empty dataframe without multiindex, this works. Or at least it should give a better error message.

The fact that df.loc[('b', '1', slice(None)), :] does not raise an error while df.loc[('b', '1'), :] does, I am not fully sure about the explanation. But I suppose it is because of having the slice(None) makes this seen as "indexing an MI with multiple indexers (one for each level)", while in the other case you are selecting only single labels (up to a certain level). But I agree this is a confusing case.
If we want to discuss this further, let's open a separate issue for it.

lauhayden · 2018-02-06T15:05:40Z

Alright, so this issue should be focused on to_excel throwing IndexError. I'll open a separate issue for the MultiIndex slicing behaviour.

jorisvandenbossche · 2018-02-06T15:33:41Z

OK, I edited the title to reflect this. Probably the most sensible thing to do is to be consistent with empty dataframes in general (writing only the column headers to excel)

PR welcome!

mhooreman · 2019-02-25T09:23:42Z

Hello,
Not a lot of added value ... but the code below helps to easy reproduce this:
pd.DataFrame([[],[],[]]).T.set_index([0, 1]).to_excel('test.xlsx')

qwertystop · 2021-06-15T16:11:59Z

I encountered this error on 1.2.4 having produced such an empty-multiindexed DataFrame via DataFrame.compare of two dataframes having no differences, where the index on each DataFrame happened to be a MultiIndex; the end result is an empty DataFrame with an empty and unlabeled MultiIndex for columns, and an empty-but-labeled MultiIndex for rows. Without an existing MultiIndex, this could still arise from comparison-with-no-differences followed by transposition (converting the automatically-generated MultiIndex columns into rows). I would also like to note that no error arises from an empty DataFrame with MultiIndex columns but single-level rows.

Simple sample of such, both as an example of this problem naturally arising from compare and so that the actual error message (index 0 is out of bounds for axis 0 with size 0) beyond that it's an IndexError shows up on this issue for ease of searching:

Python 3.7.4 (default, Apr 28 2021, 11:52:34)
[GCC 10.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pandas as pd
>>> df = pd.DataFrame([[1,2,3],[4,5,6]])
>>> df.compare(df)
Empty DataFrame
Columns: []
Index: []
>>> df.compare(df).T.index
MultiIndex([], )
>>> df.compare(df).T.index.levels
FrozenList([[], ['self', 'other']])
>>> df.compare(df).to_excel("test.xlsx")  # no error
>>> df.compare(df).T.to_excel("test.xlsx")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/qwertystop/.pyenv/versions/testenv/lib/python3.7/site-packages/pandas
/core/generic.py", line 2196, in to_excel
    storage_options=storage_options,
  File "/home/qwertystop/.pyenv/versions/testenv/lib/python3.7/site-packages/pandas
/io/formats/excel.py", line 826, in write
    freeze_panes=freeze_panes,
  File "/home/qwertystop/.pyenv/versions/testenv/lib/python3.7/site-packages/pandas
/io/excel/_openpyxl.py", line 410, in write_cells
    for cell in cells:
  File "/home/qwertystop/.pyenv/versions/testenv/lib/python3.7/site-packages/pandas
/io/formats/excel.py", line 757, in get_formatted_cells
    for cell in itertools.chain(self._format_header(), self._format_body()):
  File "/home/qwertystop/.pyenv/versions/testenv/lib/python3.7/site-packages/pandas
/io/formats/excel.py", line 576, in _format_header_regular
    coloffset = len(self.df.index[0])
  File "/home/qwertystop/.pyenv/versions/testenv/lib/python3.7/site-packages/pandas
/core/indexes/multi.py", line 2028, in __getitem__
    if level_codes[key] == -1:
IndexError: index 0 is out of bounds for axis 0 with size 0

wf-r · 2022-05-24T11:34:13Z

If I see this correctly, it would suffice to reference self.df.index.names instead of self.df.index[0] in

pandas/pandas/io/formats/excel.py

Line 637 in 46bcf37

coloffset = len(self.df.index[0])

I will try to construct a fix.

jorisvandenbossche added IO Excel read_excel, to_excel Error Reporting Incorrect or improved errors from pandas labels Feb 6, 2018

jorisvandenbossche changed the title ~~MultiIndex with empty lists in labels causes DataFrame.to_excel() to throw IndexError~~ BUG: to_excel to throw IndexError for dataframe with empty (row) MultiIndex Feb 6, 2018

lauhayden mentioned this issue Feb 6, 2018

Inconsistent MultiIndex behaviour when indexing selects no rows #19556

Closed

mroeschke added the Bug label May 7, 2020

wf-r mentioned this issue May 25, 2022

BUG Avoid IndexError on writing empty (row) MI df to excel (GH19543) #47111

Merged

4 tasks

mroeschke closed this as completed in #47111 Jun 15, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: to_excel to throw IndexError for dataframe with empty (row) MultiIndex #19543

BUG: to_excel to throw IndexError for dataframe with empty (row) MultiIndex #19543

lauhayden commented Feb 5, 2018

INSTALLED VERSIONS

jorisvandenbossche commented Feb 6, 2018

lauhayden commented Feb 6, 2018

jorisvandenbossche commented Feb 6, 2018

mhooreman commented Feb 25, 2019

qwertystop commented Jun 15, 2021 •

edited

Loading

wf-r commented May 24, 2022

BUG: to_excel to throw IndexError for dataframe with empty (row) MultiIndex #19543

BUG: to_excel to throw IndexError for dataframe with empty (row) MultiIndex #19543

Comments

lauhayden commented Feb 5, 2018

Problem description

Output of pd.show_versions()

INSTALLED VERSIONS

jorisvandenbossche commented Feb 6, 2018

lauhayden commented Feb 6, 2018

jorisvandenbossche commented Feb 6, 2018

mhooreman commented Feb 25, 2019

qwertystop commented Jun 15, 2021 • edited Loading

wf-r commented May 24, 2022

Output of `pd.show_versions()`

qwertystop commented Jun 15, 2021 •

edited

Loading