BUG: excel export merge margin #12453

ghost · 2016-02-26T12:33:57Z

Code Sample, a copy-pastable example if possible

df = pd.DataFrame([ {"JOB": "Worker", "NAME": "Bob", "YEAR": 2013, "MONTH": 12, "DAYS": 3, "SALARY": 17},
                    {"JOB": "Employ", "NAME": "Mary", "YEAR": 2013, "MONTH": 12, "DAYS": 5, "SALARY": 23},
                    {"JOB": "Worker", "NAME": "Bob", "YEAR": 2014, "MONTH": 1, "DAYS": 10, "SALARY": 100},
                    {"JOB": "Worker", "NAME": "Bob", "YEAR": 2014, "MONTH": 1, "DAYS": 11, "SALARY": 110},
                    {"JOB": "Employ", "NAME": "Mary", "YEAR": 2014, "MONTH": 1, "DAYS": 15, "SALARY": 200},
                    {"JOB": "Worker", "NAME": "Bob", "YEAR": 2014, "MONTH": 2, "DAYS": 8, "SALARY": 80},
                    {"JOB": "Employ", "NAME": "Mary", "YEAR": 2014, "MONTH": 2, "DAYS": 5, "SALARY": 190}])
df = df.set_index(["JOB", "NAME", "YEAR", "MONTH"], drop=False, append=False)

df = df.pivot_table(index=["JOB", "NAME"], columns=["YEAR", "MONTH"], values=["DAYS", "SALARY"], aggfunc={"DAYS": "mean", "SALARY": "sum"}, margins=True)

print(df)

file = r"E:\TESTE.xlsx"
writer = pd.ExcelWriter(file, engine="xlsxwriter", datetime_format="yyyy-mm-dd")
df.to_excel(writer, sheet_name="TESTE", index=True)
writer.save()

Expected Output

	SALARY				DAYS
YEAR		2013	2014		All	2013	2014		All
MONTH		12	1	2		12	1	2
JOB	NAME
Employ	Mary	23	200	190	413	5	15.0	5.0	8.333333
Worker	Bob	17	210	80	307	3	10.5	8.0	8.000000
All		40	410	270	720	4	12.0	6.5	8.142857

But when exporting to Excel, the Month 2, on SALARY and DAYS, merges with the next column and All above.

2014 .......... | All ...........|
1 ..... | 2 ....................... |
15 ... | 5 ..... | 8.333333 |

When exporting to HTML it's ok with the header but, if the value cell has no value then shows cuts the columns.

output of `pd.show_versions()`

INSTALLED VERSIONS

commit: None
python: 3.5.1.final.0
python-bits: 64
OS: Windows
OS-release: 2012ServerR2
machine: AMD64
processor: Intel64 Family 6 Model 47 Stepping 2, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None

pandas: 0.17.1
nose: None
pip: 8.0.2
setuptools: 20.1.1
Cython: None
numpy: 1.10.4
scipy: 0.17.0
statsmodels: None
IPython: None
sphinx: None
patsy: None
dateutil: 2.4.2
pytz: 2015.7
blosc: None
bottleneck: None
tables: None
numexpr: None
matplotlib: 1.5.1
openpyxl: 2.3.2
xlrd: 0.9.4
xlwt: None
xlsxwriter: 0.8.4
lxml: None
bs4: None
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: None
pymysql: None
psycopg2: None
Jinja2: None
None

The text was updated successfully, but these errors were encountered:

jreback · 2016-02-26T12:40:26Z

try merge_cells=False

jreback · 2016-02-26T12:45:52Z

cc @chris-b1

ghost · 2016-02-26T12:49:20Z

merge_cells affects all the cells and not just the "margins".

chris-b1 · 2016-02-26T15:07:14Z

Something going awry in the logic here when there's an empty level, I'll take a look.

ghost · 2016-02-27T12:15:19Z

Here a picture of when exporting the same dataframe to html:

brandys11 · 2016-05-08T14:22:45Z

I have just opened PR for this. The only problem could be, that empty name is changed to Unnamed x_level_y when reading back form excel.

jreback · 2016-05-08T14:56:05Z

I think the expected output above is wrong

In [11]: df
Out[11]: 
            SALARY                      DAYS                     
YEAR          2013   2014           All 2013  2014            All
MONTH           12      1      2          12     1    2          
JOB    NAME                                                      
Employ Mary   23.0  200.0  190.0  413.0  5.0  15.0  5.0  8.333333
Worker Bob    17.0  210.0   80.0  307.0  3.0  10.5  8.0  8.000000
All           40.0  410.0  270.0  720.0  4.0  12.0  6.5  8.142857

This is correct (as pandas prints it in the console)

In [12]: df.index
Out[12]: 
MultiIndex(levels=[[u'All', u'Employ', u'Worker'], [u'', u'Bob', u'Mary']],
           labels=[[1, 2, 0], [2, 1, 0]],
           names=[u'JOB', u'NAME'])

In [13]: df.columns
Out[13]: 
MultiIndex(levels=[[u'DAYS', u'SALARY'], [2013, 2014, u'All'], [1, 2, 12, u'']],
           labels=[[1, 1, 1, 1, 0, 0, 0, 0], [0, 1, 1, 2, 0, 1, 1, 2], [2, 0, 1, 3, 2, 0, 1, 3]],
           names=[None, u'YEAR', u'MONTH'])

brandys11 · 2016-05-10T13:44:50Z

Have you been referring to my PR? If yes, is it ok to change behavior so that empty index names would stay empty instead "Unnamed: %d_level_%d" as it is right now?

jreback · 2016-05-10T14:44:07Z

yes I referred to the PR. My point is that the example in the issue is wrong.

brandys11 · 2016-05-10T23:47:31Z

Sorry, but I am still not sure if I fully understand what you are saying. The html table in first comment looks exactly as yours. And the one in the comment from 27th February is wrong, although I get correct output when calling df.to_html().

Running it after applying my PR would yield:

In [11]: excel = pd.read_excel(file, header=[0,1,2], index_col=[0,1])

In [12]: excel
Out[12]: 
            DAYS                               SALARY            
YEAR        2013  2014                     All   2013 2014        All
MONTH         12     1    2 Unnamed: 5_level_2     12    1    2   Unnamed: 9_level_2
JOB    NAME                                                       
Employ Mary    5  15.0  5.0           8.333333     23  200  190   413
Worker Bob     3  10.5  8.0           8.000000     17  210   80   307
All    Bob     4  12.0  6.5           8.142857     40  410  270   720

In [13]: excel.index
Out[13]: 
MultiIndex(levels=[['All', 'Employ', 'Worker'], ['Bob', 'Mary']],
           labels=[[1, 2, 0], [1, 0, 0]],
           names=['JOB', 'NAME'])

In [14]: excel.columns
Out[14]: 
MultiIndex(levels=[['DAYS', 'SALARY'], [2013, 2014, 'All'], [1, 2, 12, 'Unnamed: 5_level_2', 'Unnamed: 9_level_2']],
           labels=[[0, 0, 0, 0, 1, 1, 1, 1], [0, 1, 1, 2, 0, 1, 1, 2], [2, 0, 1, 3, 2, 0, 1, 4]],
           names=[None, 'YEAR', 'MONTH'])

The problem is that "Unnamed" cells are not merged and we get more Indexes on the 3rd level. I have tried to empty strings insted of "Unnamed" cells. Some tests have failed -> test_unnamed_columns as the column names have to be unique. We could count the number of empty cells and use the "Unnamed cells" only if there is more of them or be Ok with having "Unnamed" cells all the time.

Do you have any other sugestions, or should I try implementing one of mentioned?

jreback · 2016-05-11T01:02:16Z

@brandys11 The tables are not the same, salary is misplaced in the top table. The top row is 1 column off.

As far as your PR, you have something wrong. You should't have any unnamed columns. This conversation should move to the PR itself, not the issue.

ghost · 2016-05-11T07:20:40Z

Sorry, but I'm not fully understanding. Is it a bug or am I doing something wrong?

jreback added the IO Excel read_excel, to_excel label Feb 26, 2016

jreback added Bug Difficulty Novice labels Feb 27, 2016

jreback added this to the Next Major Release milestone Feb 27, 2016

brandys11 mentioned this issue May 8, 2016

[BUG] Reading multiindex, incorrectly names columns without name. #13115

Closed

4 tasks

jreback changed the title ~~pivot_table export merge margin~~ BUG: excel export merge margin May 8, 2016

jreback modified the milestones: 0.18.2, Next Major Release Jun 4, 2016

jreback closed this as completed in 67b72e3 Jun 6, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: excel export merge margin #12453

BUG: excel export merge margin #12453

ghost commented Feb 26, 2016

jreback commented Feb 26, 2016

jreback commented Feb 26, 2016

ghost commented Feb 26, 2016

chris-b1 commented Feb 26, 2016

ghost commented Feb 27, 2016

brandys11 commented May 8, 2016

jreback commented May 8, 2016 •

edited

Loading

brandys11 commented May 10, 2016

jreback commented May 10, 2016

brandys11 commented May 10, 2016

jreback commented May 11, 2016

ghost commented May 11, 2016

BUG: excel export merge margin #12453

BUG: excel export merge margin #12453

Comments

ghost commented Feb 26, 2016

Code Sample, a copy-pastable example if possible

Expected Output

output of pd.show_versions()

INSTALLED VERSIONS

jreback commented Feb 26, 2016

jreback commented Feb 26, 2016

ghost commented Feb 26, 2016

chris-b1 commented Feb 26, 2016

ghost commented Feb 27, 2016

brandys11 commented May 8, 2016

jreback commented May 8, 2016 • edited Loading

brandys11 commented May 10, 2016

jreback commented May 10, 2016

brandys11 commented May 10, 2016

jreback commented May 11, 2016

ghost commented May 11, 2016

output of `pd.show_versions()`

jreback commented May 8, 2016 •

edited

Loading