Skip to content

BUG: Error when exporting an empty DataFrame with MultiIndex to xlsx file #46873

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
2 of 3 tasks
WEGFan opened this issue Apr 26, 2022 · 4 comments
Closed
2 of 3 tasks
Labels
Duplicate Report Duplicate issue or pull request IO Excel read_excel, to_excel

Comments

@WEGFan
Copy link

WEGFan commented Apr 26, 2022

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd

df = pd.DataFrame(
    index=pd.MultiIndex.from_arrays([[], []], names=['a', 'b']),
    columns=['A', 'B']
)
df.to_excel('test.xlsx')

Issue Description

---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
Input In [9], in <cell line: 7>()
      1 import pandas as pd
      3 df = pd.DataFrame(
      4     index=pd.MultiIndex.from_arrays([[], []], names=['a', 'b']),
      5     columns=['A', 'B']
      6 )
----> 7 df.to_excel('test.xlsx')

File E:\Program Files\Python39\lib\site-packages\pandas\core\generic.py:2345, in NDFrame.to_excel(self, excel_writer, sheet_name, na_rep, float_format, columns, header, index, index_label, startrow, startcol, engine, merge_cells, encoding, inf_rep, verbose, freeze_panes, storage_options)
   2332 from pandas.io.formats.excel import ExcelFormatter
   2334 formatter = ExcelFormatter(
   2335     df,
   2336     na_rep=na_rep,
   (...)
   2343     inf_rep=inf_rep,
   2344 )
-> 2345 formatter.write(
   2346     excel_writer,
   2347     sheet_name=sheet_name,
   2348     startrow=startrow,
   2349     startcol=startcol,
   2350     freeze_panes=freeze_panes,
   2351     engine=engine,
   2352     storage_options=storage_options,
   2353 )

File E:\Program Files\Python39\lib\site-packages\pandas\io\formats\excel.py:894, in ExcelFormatter.write(self, writer, sheet_name, startrow, startcol, freeze_panes, engine, storage_options)
    891     need_save = True
    893 try:
--> 894     writer.write_cells(
    895         formatted_cells,
    896         sheet_name,
    897         startrow=startrow,
    898         startcol=startcol,
    899         freeze_panes=freeze_panes,
    900     )
    901 finally:
    902     # make sure to close opened file handles
    903     if need_save:

File E:\Program Files\Python39\lib\site-packages\pandas\io\excel\_xlsxwriter.py:227, in XlsxWriter.write_cells(self, cells, sheet_name, startrow, startcol, freeze_panes)
    224 if validate_freeze_panes(freeze_panes):
    225     wks.freeze_panes(*(freeze_panes))
--> 227 for cell in cells:
    228     val, fmt = self._value_with_fmt(cell.val)
    230     stylekey = json.dumps(cell.style)

File E:\Program Files\Python39\lib\site-packages\pandas\io\formats\excel.py:831, in ExcelFormatter.get_formatted_cells(self)
    830 def get_formatted_cells(self) -> Iterable[ExcelCell]:
--> 831     for cell in itertools.chain(self._format_header(), self._format_body()):
    832         cell.val = self._format_value(cell.val)
    833         yield cell

File E:\Program Files\Python39\lib\site-packages\pandas\io\formats\excel.py:630, in ExcelFormatter._format_header_regular(self)
    628     coloffset = 1
    629     if isinstance(self.df.index, MultiIndex):
--> 630         coloffset = len(self.df.index[0])
    632 colnames = self.columns
    633 if self._has_aliases:

File E:\Program Files\Python39\lib\site-packages\pandas\core\indexes\multi.py:2075, in MultiIndex.__getitem__(self, key)
   2073 retval = []
   2074 for lev, level_codes in zip(self.levels, self.codes):
-> 2075     if level_codes[key] == -1:
   2076         retval.append(np.nan)
   2077     else:

IndexError: index 0 is out of bounds for axis 0 with size 0

Expected Behavior

Write a sheet with only column header

Installed Versions

INSTALLED VERSIONS

commit : 4bfe3d0
python : 3.9.7.final.0
python-bits : 64
OS : Windows
OS-release : 10
Version : 10.0.19042
machine : AMD64
processor : Intel64 Family 6 Model 94 Stepping 3, GenuineIntel
byteorder : little
LC_ALL : None
LANG : None
LOCALE : Chinese (Simplified)_China.936

pandas : 1.4.2
numpy : 1.22.3
pytz : 2022.1
dateutil : 2.8.2
pip : 22.0.4
setuptools : 57.4.0
Cython : 0.29.28
pytest : 7.1.1
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : 3.0.3
lxml.etree : None
html5lib : None
pymysql : 1.0.2
psycopg2 : None
jinja2 : 3.0.3
IPython : 8.2.0
pandas_datareader: None
bs4 : 4.10.0
bottleneck : None
brotli : None
fastparquet : None
fsspec : None
gcsfs : None
markupsafe : 2.0.1
matplotlib : None
numba : None
numexpr : None
odfpy : None
openpyxl : 3.0.9
pandas_gbq : None
pyarrow : None
pyreadstat : None
pyxlsb : None
s3fs : None
scipy : None
snappy : None
sqlalchemy : None
tables : None
tabulate : 0.8.9
xarray : None
xlrd : None
xlwt : None
zstandard : None

@WEGFan WEGFan added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Apr 26, 2022
@simonjayhawkins simonjayhawkins added the IO Excel read_excel, to_excel label May 5, 2022
@mroeschke
Copy link
Member

Appears to work on main, could use a test if we don't have one already.

In [1]: import pandas as pd
   ...:
   ...: df = pd.DataFrame(
   ...:     index=pd.MultiIndex.from_arrays([[], []], names=['a', 'b']),
   ...:     columns=['A', 'B']
   ...: )
   ...: df.to_excel('test.xlsx')

In [2]: pd.__version__
Out[2]: '1.5.0.dev0+1270.g926b9ceff1'

@mroeschke mroeschke added good first issue Needs Tests Unit test(s) needed to prevent regressions and removed Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Aug 11, 2022
@Nikhil-Mudgal
Copy link

I was able to reproduce the error with pd.version == 1.4.2 and pd.version == 1.4.3 in my linux machine too.

Did some digging and found out that the error was noticed 2 months prior and has been fixed in main:- #47111.

For ref the change is in the line 630 of the following:-

  File E:\Program Files\Python39\lib\site-packages\pandas\io\formats\excel.py:630, in 
  ExcelFormatter._format_header_regular(self)
          628     coloffset = 1
          629     if isinstance(self.df.index, MultiIndex):
 -->     630         coloffset = len(self.df.index[0])  # this line itself 
          632   colnames = self.columns
          633   if self._has_aliases:

self.df.index[0] has been called on an empty dataframe and is trying to access item in index 0 which is not there.

which has been replaced by:-

     if self.index:
              coloffset = 1
              if isinstance(self.df.index, MultiIndex):
                        coloffset = len(self.df.index.names)

@mroeschke can you confirm if I am right on this and if I can work on this issue.

@mroeschke
Copy link
Member

Thanks for finding that #47111 fixed this issue and added an equivalent test as reported in this issue. Therefore closing as fixed and tested, but happy to have you help on other issues!

@mroeschke mroeschke added Duplicate Report Duplicate issue or pull request and removed good first issue Needs Tests Unit test(s) needed to prevent regressions labels Aug 12, 2022
@Nikhil-Mudgal
Copy link

@mroeschke Happy to help. Will be on the lookout for some beginner friendly issues like this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Duplicate Report Duplicate issue or pull request IO Excel read_excel, to_excel
Projects
None yet
Development

No branches or pull requests

4 participants