Skip to content

BUG: to_excel inserts invalid data if a level in a MultiIndex is None #51252

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
3 tasks done
musshorn opened this issue Feb 8, 2023 · 5 comments
Open
3 tasks done
Labels
Bug IO Excel read_excel, to_excel

Comments

@musshorn
Copy link

musshorn commented Feb 8, 2023

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd
import numpy as np

Cols = pd.MultiIndex.from_tuples([("Item", "Type 1", "Class A"), ("Item", "Type 1", "Class B"), ("Item", "Type 2", "Class A"), ("Item", "Type 2", "Class B"), ("Item", "Type 3", None)])

df = pd.DataFrame(np.random.randn(10, 5), columns=Cols)
print(df)
df.to_excel("Test.xlsx")

Issue Description

When a level in a MultiIndex is None, the resulting dataframe when exported to excel has an invalid level inserted replacing the None. See Image.
PandasIssue

Expected Behavior

I would expect the result from to_excel should reflect the levels in the MultiIndex correctly.

Installed Versions

pd.show_versions()

INSTALLED VERSIONS

commit : 2e218d1
python : 3.10.0.final.0
python-bits : 64
OS : Windows
OS-release : 10
Version : 10.0.19044
machine : AMD64
processor : Intel64 Family 6 Model 79 Stepping 1, GenuineIntel
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : English_Australia.1252

pandas : 1.5.3
numpy : 1.22.4
pytz : 2022.2.1
dateutil : 2.8.2
setuptools : 57.4.0
pip : 23.0
Cython : 0.29.32
pytest : 7.2.0
hypothesis : None
sphinx : 4.5.0
blosc : None
feather : None
xlsxwriter : 3.0.1
lxml.etree : 4.9.1
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 3.1.2
IPython : 8.1.1
pandas_datareader: None
bs4 : 4.10.0
bottleneck : None
brotli : 1.0.9
fastparquet : None
fsspec : None
gcsfs : None
matplotlib : 3.5.1
numba : None
numexpr : 2.8.4
odfpy : None
openpyxl : 3.1.0
pandas_gbq : None
pyarrow : None
pyreadstat : None
pyxlsb : None
s3fs : None
scipy : 1.8.0
snappy : None
sqlalchemy : None
tables : 3.7.0
tabulate : 0.8.9
xarray : None
xlrd : None
xlwt : 1.3.0
zstandard : 0.17.0
tzdata : None

@musshorn musshorn added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Feb 8, 2023
@phofl
Copy link
Member

phofl commented Feb 9, 2023

I guess this is related to forward filling for round tripping purposes (see read_excel index_col documentation). But investigations welcome

@phofl phofl added IO Excel read_excel, to_excel and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Feb 9, 2023
@musshorn
Copy link
Author

musshorn commented Feb 9, 2023

I think

for lnum, (spans, levels, level_codes) in enumerate(
zip(level_lengths, columns.levels, columns.codes)
):
values = levels.take(level_codes)
for i, span_val in spans.items():
mergestart, mergeend = None, None
if span_val > 1:
mergestart, mergeend = lnum, coloffset + i + span_val
yield CssExcelCell(
row=lnum,
col=coloffset + i + 1,
val=values[i],
style=self.header_style,
css_styles=getattr(self.styler, "ctx_columns", None),
css_row=lnum,
css_col=i,
css_converter=self.style_converter,
mergestart=mergestart,
mergeend=mergeend,
)
is the offending loop. When it reaches the last level, levels.take(level_codes) gets passed [0, 1, 0, 1, -1] which sets values to ['Class A', 'Class B', 'Class A', 'Class B', 'Class B'] and that's ultimately the row that gets written to Excel.

@ghost
Copy link

ghost commented Feb 9, 2023

allow_fill = self._maybe_disallow_fill(allow_fill, fill_value, indices)

This variable is set to False for your case. When I manually set it to True in my debugger I get the expected Excel output.

Now it's left to figure out why it's set to False automatically...

Can I take it?

@musshorn
Copy link
Author

musshorn commented Feb 9, 2023

Yeah all yours, I was just trying to provide some starting research for whoever took it up

@ghost
Copy link

ghost commented Feb 9, 2023

take

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug IO Excel read_excel, to_excel
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants