BUG: to_excel inserts invalid data if a level in a MultiIndex is None #51252

musshorn · 2023-02-08T23:19:42Z

Pandas version checks

I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd
import numpy as np

Cols = pd.MultiIndex.from_tuples([("Item", "Type 1", "Class A"), ("Item", "Type 1", "Class B"), ("Item", "Type 2", "Class A"), ("Item", "Type 2", "Class B"), ("Item", "Type 3", None)])

df = pd.DataFrame(np.random.randn(10, 5), columns=Cols)
print(df)
df.to_excel("Test.xlsx")

Issue Description

When a level in a MultiIndex is None, the resulting dataframe when exported to excel has an invalid level inserted replacing the None. See Image.

Expected Behavior

I would expect the result from to_excel should reflect the levels in the MultiIndex correctly.

Installed Versions

pd.show_versions()

INSTALLED VERSIONS

commit : 2e218d1
python : 3.10.0.final.0
python-bits : 64
OS : Windows
OS-release : 10
Version : 10.0.19044
machine : AMD64
processor : Intel64 Family 6 Model 79 Stepping 1, GenuineIntel
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : English_Australia.1252

pandas : 1.5.3
numpy : 1.22.4
pytz : 2022.2.1
dateutil : 2.8.2
setuptools : 57.4.0
pip : 23.0
Cython : 0.29.32
pytest : 7.2.0
hypothesis : None
sphinx : 4.5.0
blosc : None
feather : None
xlsxwriter : 3.0.1
lxml.etree : 4.9.1
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 3.1.2
IPython : 8.1.1
pandas_datareader: None
bs4 : 4.10.0
bottleneck : None
brotli : 1.0.9
fastparquet : None
fsspec : None
gcsfs : None
matplotlib : 3.5.1
numba : None
numexpr : 2.8.4
odfpy : None
openpyxl : 3.1.0
pandas_gbq : None
pyarrow : None
pyreadstat : None
pyxlsb : None
s3fs : None
scipy : 1.8.0
snappy : None
sqlalchemy : None
tables : 3.7.0
tabulate : 0.8.9
xarray : None
xlrd : None
xlwt : 1.3.0
zstandard : 0.17.0
tzdata : None

The text was updated successfully, but these errors were encountered:

phofl · 2023-02-09T00:23:56Z

I guess this is related to forward filling for round tripping purposes (see read_excel index_col documentation). But investigations welcome

musshorn · 2023-02-09T11:14:56Z

I think

pandas/pandas/io/formats/excel.py

Lines 641 to 660 in 13db83a

    
           for lnum, (spans, levels, level_codes) in enumerate( 
        
               zip(level_lengths, columns.levels, columns.codes) 
        
           ): 
        
               values = levels.take(level_codes) 
        
               for i, span_val in spans.items(): 
        
                   mergestart, mergeend = None, None 
        
                   if span_val > 1: 
        
                       mergestart, mergeend = lnum, coloffset + i + span_val 
        
                   yield CssExcelCell( 
        
                       row=lnum, 
        
                       col=coloffset + i + 1, 
        
                       val=values[i], 
        
                       style=self.header_style, 
        
                       css_styles=getattr(self.styler, "ctx_columns", None), 
        
                       css_row=lnum, 
        
                       css_col=i, 
        
                       css_converter=self.style_converter, 
        
                       mergestart=mergestart, 
        
                       mergeend=mergeend, 
        
                   )

is the offending loop. When it reaches the last level, levels.take(level_codes) gets passed [0, 1, 0, 1, -1] which sets values to ['Class A', 'Class B', 'Class A', 'Class B', 'Class B'] and that's ultimately the row that gets written to Excel.

ghost · 2023-02-09T17:09:00Z

pandas/pandas/core/indexes/base.py

Line 1052 in b9a4335

allow_fill = self._maybe_disallow_fill(allow_fill, fill_value, indices)

This variable is set to False for your case. When I manually set it to True in my debugger I get the expected Excel output.

Now it's left to figure out why it's set to False automatically...

Can I take it?

musshorn · 2023-02-09T20:57:28Z

Yeah all yours, I was just trying to provide some starting research for whoever took it up

ghost · 2023-02-09T21:03:29Z

take

musshorn added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Feb 8, 2023

phofl added IO Excel read_excel, to_excel and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Feb 9, 2023

github-actions bot assigned ghost Feb 10, 2023

ghost mentioned this issue Mar 7, 2023

Fix to_excel not rendering None values in MultiIndex #51824

Closed

5 tasks

ldouteau mentioned this issue Aug 20, 2024

BUG: inconsistency when read_csv reads MultiIndex with empty values #59560

Open

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: to_excel inserts invalid data if a level in a MultiIndex is None #51252

BUG: to_excel inserts invalid data if a level in a MultiIndex is None #51252

musshorn commented Feb 8, 2023

INSTALLED VERSIONS

phofl commented Feb 9, 2023

musshorn commented Feb 9, 2023

ghost commented Feb 9, 2023 •

edited by ghost

Loading

musshorn commented Feb 9, 2023

ghost commented Feb 9, 2023

BUG: to_excel inserts invalid data if a level in a MultiIndex is None #51252

BUG: to_excel inserts invalid data if a level in a MultiIndex is None #51252

Comments

musshorn commented Feb 8, 2023

Pandas version checks

Reproducible Example

Issue Description

Expected Behavior

Installed Versions

INSTALLED VERSIONS

phofl commented Feb 9, 2023

musshorn commented Feb 9, 2023

ghost commented Feb 9, 2023 • edited by ghost Loading

musshorn commented Feb 9, 2023

ghost commented Feb 9, 2023

ghost commented Feb 9, 2023 •

edited by ghost

Loading