Skip to content

multiindexed DataFrame excel export and import doesn't yield original #16331

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Heerpa opened this issue May 11, 2017 · 1 comment
Closed

multiindexed DataFrame excel export and import doesn't yield original #16331

Heerpa opened this issue May 11, 2017 · 1 comment
Labels
IO Excel read_excel, to_excel Usage Question

Comments

@Heerpa
Copy link

Heerpa commented May 11, 2017

I am using a multiindexed dataframe and save and load it to and from .xlsx files. This is part of my unit testing procedure, and after upgrading to pandas 0.20.1, the multiindex is not recovered any more.

Code Sample, a copy-pastable example if possible

arrays = [['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux'],
                ['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two']]
tuples = list(zip(*arrays))
index = pd.MultiIndex.from_tuples(tuples, names=['first', 'second'])
df = pd.DataFrame(np.random.randn(6, 6), index=index[:6], columns=index[:6])
print(df)
df.to_excel('test.xlsx')
dfin = pd.read_excel('test.xlsx')
print(dfin)

Problem description

the re-loaded dataframe has "unknown#" labels at level0 and Nan at higher levels in index as well as columns, instead of correctly recreating the multiindex from the original DataFrame.

Expected Output

df:

first bar baz foo
second one two one two one two
first second
bar one -0.841135 -1.173882 0.206777 -0.805181 0.027380 0.326886
two -0.139861 -0.749935 -1.965764 0.236397 0.607762 0.016523
baz one -0.438246 0.315781 0.253052 -0.117903 0.546020 0.834073
two -1.772478 0.509509 0.003263 1.801822 -0.297373 0.342030
foo one 0.661079 0.078506 0.572583 -0.459106 -0.584597 1.988036
two 0.007465 0.811799 -1.606569 0.463014 -1.242725 1.490809

dfin:

    first       bar Unnamed: 2        baz Unnamed: 4        foo Unnamed: 6

NaN second one two one two one two
first second NaN NaN NaN NaN NaN NaN
bar one 0.55971 0.0495165 -2.48716 -0.608794 0.171518 1.61321
NaN two 0.167981 0.947228 -0.129498 -0.475977 0.0289783 1.9852
baz one -0.827946 0.225728 -0.0272926 -1.39887 -0.147073 0.201533
NaN two 1.55474 -0.406972 -1.85484 0.409694 -1.82329 1.03537
foo one 0.721785 -1.61517 -0.69211 0.330506 0.796323 -0.537428
NaN two 1.46575 1.55758 1.73001 -0.47926 0.162746 0.783132

Output of pd.show_versions()

# Paste the output here pd.show_versions() here INSTALLED VERSIONS ------------------ commit: None python: 3.5.0.final.0 python-bits: 64 OS: Darwin OS-release: 16.5.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: en_US.UTF-8 LANG: en_US.UTF-8 LOCALE: en_US.UTF-8

pandas: 0.19.2
nose: 1.3.7
pip: 9.0.1
setuptools: 21.2.1
Cython: 0.23.4
numpy: 1.11.2
scipy: 0.18.0
statsmodels: 0.6.1
xarray: None
IPython: 4.0.1
sphinx: 1.3.1
patsy: 0.4.0
dateutil: 2.4.2
pytz: 2015.7
blosc: None
bottleneck: 1.0.0
tables: 3.2.2
numexpr: 2.6.1
matplotlib: 1.5.0
openpyxl: 2.4.0
xlrd: 0.9.4
xlwt: 1.0.0
xlsxwriter: 0.7.7
lxml: 3.4.4
bs4: 4.4.1
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: 1.0.9
pymysql: None
psycopg2: None
jinja2: 2.8
boto: 2.38.0
pandas_datareader: None

@chris-b1
Copy link
Contributor

Currently, with a MultiIndex, the header and index columns must be specified when reading back in, as below. This isn't new behavior in 0.20.1 (at least compared to 0.19.2).

xref #11292

In [51]: pd.read_excel('test.xlsx', header=[0,1], index_col=[0,1])

first              bar                 baz                 foo          
second             one       two       one       two       one       two
first second                                                            
bar   one    -1.130701  0.624430 -1.546881  1.987738  0.092758 -1.440665
      two    -0.181748  0.574994  0.121410  0.723244  0.031392 -1.806494
baz   one    -1.455962  0.110831  3.149095 -1.457039 -0.966952  0.689411
      two    -0.454616  1.106233  0.616380  0.290951 -0.355218  0.031521
foo   one     0.356507  1.072097  0.887365 -1.191437 -0.744719  0.434147
      two    -0.850893  1.402040  0.954346 -0.709901 -0.348118  0.329914

@chris-b1 chris-b1 added IO Excel read_excel, to_excel Usage Question labels May 11, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
IO Excel read_excel, to_excel Usage Question
Projects
None yet
Development

No branches or pull requests

2 participants