Skip to content

BUG: to_excel to throw IndexError for dataframe with empty (row) MultiIndex #19543

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
lauhayden opened this issue Feb 5, 2018 · 6 comments · Fixed by #47111
Closed

BUG: to_excel to throw IndexError for dataframe with empty (row) MultiIndex #19543

lauhayden opened this issue Feb 5, 2018 · 6 comments · Fixed by #47111
Labels
Bug Error Reporting Incorrect or improved errors from pandas IO Excel read_excel, to_excel

Comments

@lauhayden
Copy link

mi = pd.MultiIndex.from_arrays([
        np.array(['a', 'a', 'b', 'b']), 
        np.array(['1', '2', '2', '3']), 
        np.array(['alpha', 'beta', 'alpha', 'beta']),
    ], names=['one', 'two', 'three'])
df = pd.DataFrame(np.random.rand(4, 3), index=mi)
df2 = df.loc[('b', '1', slice(None)), :]
print(df2.index)

Output:

MultiIndex(levels=[['a', 'b'], ['1', '2', '3'], ['alpha', 'beta']],
           labels=[[], [], []],
           names=['one', 'two', 'three'])

The next line causes an IndexError:

df2.to_excel('test.xlsx')

Notice that the 'empty' MultiIndex does not give any indication of its presence when the entire dataframe is printed, looking like an ordinary empty dataframe instead.

Empty DataFrame
Columns: [0, 1, 2]
Index: []

Problem description

I'm not quite sure which behaviour is the problematic one here: the fact that the first block of code creates an 'empty' MultiIndex that has values in levels, or that to_excel doesn't write a DataFrame with such an index to file properly. The 'empty' MultiIndex behaviour is a little bit anomalous by itself, because a similar block of code (below) will cause a KeyError. The KeyError is 'suppressed' by the fact that there is a third level in the MultiIndex.

mi = pd.MultiIndex.from_arrays([
        np.array(['a', 'a', 'b', 'b']), 
        np.array(['1', '2', '2', '3']), 
    ], names=['one', 'two'])
df = pd.DataFrame(np.random.rand(4, 3), index=mi)
df2 = df.loc[('b', '1'), :]

I initially ran into this behaviour (the IndexError) when using xs, but since xs seems to be headed for the deprecation guillotine, I figured I'd rewrite this with loc instead.

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.6.1.final.0
python-bits: 64
OS: Linux
OS-release: 4.4.0-43-Microsoft
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.22.0
pytest: None
pip: 9.0.1
setuptools: 27.2.0
Cython: None
numpy: 1.14.0
scipy: 1.0.0
pyarrow: None
xarray: None
IPython: 6.2.1
sphinx: None
patsy: None
dateutil: 2.6.1
pytz: 2017.3
blosc: None
bottleneck: None
tables: 3.4.2
numexpr: 2.6.4
feather: None
matplotlib: None
openpyxl: 2.4.9
xlrd: 1.1.0
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: 1.0b10
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

@jorisvandenbossche
Copy link
Member

I'm not quite sure which behaviour is the problematic one here: the fact that the first block of code creates an 'empty' MultiIndex that has values in levels,

This is normal and expected behaviour. When taking a subset of a MultiIndex, the levels are not re-computed but kept intact (the fact that it is empty is expressed in its empty labels).

or that to_excel doesn't write a DataFrame with such an index to file properly.

This could be considered a bug, given that for an empty dataframe without multiindex, this works. Or at least it should give a better error message.

The fact that df.loc[('b', '1', slice(None)), :] does not raise an error while df.loc[('b', '1'), :] does, I am not fully sure about the explanation. But I suppose it is because of having the slice(None) makes this seen as "indexing an MI with multiple indexers (one for each level)", while in the other case you are selecting only single labels (up to a certain level). But I agree this is a confusing case.
If we want to discuss this further, let's open a separate issue for it.

@lauhayden
Copy link
Author

Alright, so this issue should be focused on to_excel throwing IndexError. I'll open a separate issue for the MultiIndex slicing behaviour.

@jorisvandenbossche jorisvandenbossche added IO Excel read_excel, to_excel Error Reporting Incorrect or improved errors from pandas labels Feb 6, 2018
@jorisvandenbossche jorisvandenbossche changed the title MultiIndex with empty lists in labels causes DataFrame.to_excel() to throw IndexError BUG: to_excel to throw IndexError for dataframe with empty (row) MultiIndex Feb 6, 2018
@jorisvandenbossche
Copy link
Member

OK, I edited the title to reflect this. Probably the most sensible thing to do is to be consistent with empty dataframes in general (writing only the column headers to excel)

PR welcome!

@mhooreman
Copy link

Hello,
Not a lot of added value ... but the code below helps to easy reproduce this:
pd.DataFrame([[],[],[]]).T.set_index([0, 1]).to_excel('test.xlsx')

@mroeschke mroeschke added the Bug label May 7, 2020
@qwertystop
Copy link

qwertystop commented Jun 15, 2021

I encountered this error on 1.2.4 having produced such an empty-multiindexed DataFrame via DataFrame.compare of two dataframes having no differences, where the index on each DataFrame happened to be a MultiIndex; the end result is an empty DataFrame with an empty and unlabeled MultiIndex for columns, and an empty-but-labeled MultiIndex for rows. Without an existing MultiIndex, this could still arise from comparison-with-no-differences followed by transposition (converting the automatically-generated MultiIndex columns into rows). I would also like to note that no error arises from an empty DataFrame with MultiIndex columns but single-level rows.

Simple sample of such, both as an example of this problem naturally arising from compare and so that the actual error message (index 0 is out of bounds for axis 0 with size 0) beyond that it's an IndexError shows up on this issue for ease of searching:

Python 3.7.4 (default, Apr 28 2021, 11:52:34)
[GCC 10.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pandas as pd
>>> df = pd.DataFrame([[1,2,3],[4,5,6]])
>>> df.compare(df)
Empty DataFrame
Columns: []
Index: []
>>> df.compare(df).T.index
MultiIndex([], )
>>> df.compare(df).T.index.levels
FrozenList([[], ['self', 'other']])
>>> df.compare(df).to_excel("test.xlsx")  # no error
>>> df.compare(df).T.to_excel("test.xlsx")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/qwertystop/.pyenv/versions/testenv/lib/python3.7/site-packages/pandas
/core/generic.py", line 2196, in to_excel
    storage_options=storage_options,
  File "/home/qwertystop/.pyenv/versions/testenv/lib/python3.7/site-packages/pandas
/io/formats/excel.py", line 826, in write
    freeze_panes=freeze_panes,
  File "/home/qwertystop/.pyenv/versions/testenv/lib/python3.7/site-packages/pandas
/io/excel/_openpyxl.py", line 410, in write_cells
    for cell in cells:
  File "/home/qwertystop/.pyenv/versions/testenv/lib/python3.7/site-packages/pandas
/io/formats/excel.py", line 757, in get_formatted_cells
    for cell in itertools.chain(self._format_header(), self._format_body()):
  File "/home/qwertystop/.pyenv/versions/testenv/lib/python3.7/site-packages/pandas
/io/formats/excel.py", line 576, in _format_header_regular
    coloffset = len(self.df.index[0])
  File "/home/qwertystop/.pyenv/versions/testenv/lib/python3.7/site-packages/pandas
/core/indexes/multi.py", line 2028, in __getitem__
    if level_codes[key] == -1:
IndexError: index 0 is out of bounds for axis 0 with size 0

@wf-r
Copy link
Contributor

wf-r commented May 24, 2022

If I see this correctly, it would suffice to reference self.df.index.names instead of self.df.index[0] in

coloffset = len(self.df.index[0])

I will try to construct a fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Error Reporting Incorrect or improved errors from pandas IO Excel read_excel, to_excel
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants