multiindexed DataFrame excel export and import doesn't yield original #16331

Heerpa · 2017-05-11T17:22:03Z

I am using a multiindexed dataframe and save and load it to and from .xlsx files. This is part of my unit testing procedure, and after upgrading to pandas 0.20.1, the multiindex is not recovered any more.

Code Sample, a copy-pastable example if possible

arrays = [['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux'],
                ['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two']]
tuples = list(zip(*arrays))
index = pd.MultiIndex.from_tuples(tuples, names=['first', 'second'])
df = pd.DataFrame(np.random.randn(6, 6), index=index[:6], columns=index[:6])
print(df)
df.to_excel('test.xlsx')
dfin = pd.read_excel('test.xlsx')
print(dfin)

Problem description

the re-loaded dataframe has "unknown#" labels at level0 and Nan at higher levels in index as well as columns, instead of correctly recreating the multiindex from the original DataFrame.

Expected Output

df:

first bar baz foo
second one two one two one two
first second
bar one -0.841135 -1.173882 0.206777 -0.805181 0.027380 0.326886
two -0.139861 -0.749935 -1.965764 0.236397 0.607762 0.016523
baz one -0.438246 0.315781 0.253052 -0.117903 0.546020 0.834073
two -1.772478 0.509509 0.003263 1.801822 -0.297373 0.342030
foo one 0.661079 0.078506 0.572583 -0.459106 -0.584597 1.988036
two 0.007465 0.811799 -1.606569 0.463014 -1.242725 1.490809

dfin:

    first       bar Unnamed: 2        baz Unnamed: 4        foo Unnamed: 6

NaN second one two one two one two
first second NaN NaN NaN NaN NaN NaN
bar one 0.55971 0.0495165 -2.48716 -0.608794 0.171518 1.61321
NaN two 0.167981 0.947228 -0.129498 -0.475977 0.0289783 1.9852
baz one -0.827946 0.225728 -0.0272926 -1.39887 -0.147073 0.201533
NaN two 1.55474 -0.406972 -1.85484 0.409694 -1.82329 1.03537
foo one 0.721785 -1.61517 -0.69211 0.330506 0.796323 -0.537428
NaN two 1.46575 1.55758 1.73001 -0.47926 0.162746 0.783132

Output of `pd.show_versions()`

# Paste the output here pd.show_versions() here INSTALLED VERSIONS ------------------ commit: None python: 3.5.0.final.0 python-bits: 64 OS: Darwin OS-release: 16.5.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: en_US.UTF-8 LANG: en_US.UTF-8 LOCALE: en_US.UTF-8

pandas: 0.19.2
nose: 1.3.7
pip: 9.0.1
setuptools: 21.2.1
Cython: 0.23.4
numpy: 1.11.2
scipy: 0.18.0
statsmodels: 0.6.1
xarray: None
IPython: 4.0.1
sphinx: 1.3.1
patsy: 0.4.0
dateutil: 2.4.2
pytz: 2015.7
blosc: None
bottleneck: 1.0.0
tables: 3.2.2
numexpr: 2.6.1
matplotlib: 1.5.0
openpyxl: 2.4.0
xlrd: 0.9.4
xlwt: 1.0.0
xlsxwriter: 0.7.7
lxml: 3.4.4
bs4: 4.4.1
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: 1.0.9
pymysql: None
psycopg2: None
jinja2: 2.8
boto: 2.38.0
pandas_datareader: None

The text was updated successfully, but these errors were encountered:

chris-b1 · 2017-05-11T17:54:10Z

Currently, with a MultiIndex, the header and index columns must be specified when reading back in, as below. This isn't new behavior in 0.20.1 (at least compared to 0.19.2).

xref #11292

In [51]: pd.read_excel('test.xlsx', header=[0,1], index_col=[0,1])

first              bar                 baz                 foo          
second             one       two       one       two       one       two
first second                                                            
bar   one    -1.130701  0.624430 -1.546881  1.987738  0.092758 -1.440665
      two    -0.181748  0.574994  0.121410  0.723244  0.031392 -1.806494
baz   one    -1.455962  0.110831  3.149095 -1.457039 -0.966952  0.689411
      two    -0.454616  1.106233  0.616380  0.290951 -0.355218  0.031521
foo   one     0.356507  1.072097  0.887365 -1.191437 -0.744719  0.434147
      two    -0.850893  1.402040  0.954346 -0.709901 -0.348118  0.329914

chris-b1 closed this as completed May 11, 2017

chris-b1 added IO Excel read_excel, to_excel Usage Question labels May 11, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

multiindexed DataFrame excel export and import doesn't yield original #16331

multiindexed DataFrame excel export and import doesn't yield original #16331

Heerpa commented May 11, 2017

chris-b1 commented May 11, 2017

multiindexed DataFrame excel export and import doesn't yield original #16331

multiindexed DataFrame excel export and import doesn't yield original #16331

Comments

Heerpa commented May 11, 2017

Code Sample, a copy-pastable example if possible

Problem description

Expected Output

Output of pd.show_versions()

chris-b1 commented May 11, 2017

Output of `pd.show_versions()`