Skip to content

BUG: excel export merge margin #12453

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
ghost opened this issue Feb 26, 2016 · 12 comments
Closed

BUG: excel export merge margin #12453

ghost opened this issue Feb 26, 2016 · 12 comments
Labels
Bug IO Excel read_excel, to_excel
Milestone

Comments

@ghost
Copy link

ghost commented Feb 26, 2016

Code Sample, a copy-pastable example if possible

df = pd.DataFrame([ {"JOB": "Worker", "NAME": "Bob", "YEAR": 2013, "MONTH": 12, "DAYS": 3, "SALARY": 17},
                    {"JOB": "Employ", "NAME": "Mary", "YEAR": 2013, "MONTH": 12, "DAYS": 5, "SALARY": 23},
                    {"JOB": "Worker", "NAME": "Bob", "YEAR": 2014, "MONTH": 1, "DAYS": 10, "SALARY": 100},
                    {"JOB": "Worker", "NAME": "Bob", "YEAR": 2014, "MONTH": 1, "DAYS": 11, "SALARY": 110},
                    {"JOB": "Employ", "NAME": "Mary", "YEAR": 2014, "MONTH": 1, "DAYS": 15, "SALARY": 200},
                    {"JOB": "Worker", "NAME": "Bob", "YEAR": 2014, "MONTH": 2, "DAYS": 8, "SALARY": 80},
                    {"JOB": "Employ", "NAME": "Mary", "YEAR": 2014, "MONTH": 2, "DAYS": 5, "SALARY": 190}])
df = df.set_index(["JOB", "NAME", "YEAR", "MONTH"], drop=False, append=False)

df = df.pivot_table(index=["JOB", "NAME"], columns=["YEAR", "MONTH"], values=["DAYS", "SALARY"], aggfunc={"DAYS": "mean", "SALARY": "sum"}, margins=True)

print(df)

file = r"E:\TESTE.xlsx"
writer = pd.ExcelWriter(file, engine="xlsxwriter", datetime_format="yyyy-mm-dd")
df.to_excel(writer, sheet_name="TESTE", index=True)
writer.save()

Expected Output

SALARY DAYS
YEAR 2013 2014 All 2013 2014 All
MONTH 12 1 2 12 1 2
JOB NAME
Employ Mary 23 200 190 413 5 15.0 5.0 8.333333
Worker Bob 17 210 80 307 3 10.5 8.0 8.000000
All 40 410 270 720 4 12.0 6.5 8.142857

But when exporting to Excel, the Month 2, on SALARY and DAYS, merges with the next column and All above.

2014 .......... | All ...........|
1 ..... | 2 ....................... |
15 ... | 5 ..... | 8.333333 |

When exporting to HTML it's ok with the header but, if the value cell has no value then shows cuts the columns.

output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.5.1.final.0
python-bits: 64
OS: Windows
OS-release: 2012ServerR2
machine: AMD64
processor: Intel64 Family 6 Model 47 Stepping 2, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None

pandas: 0.17.1
nose: None
pip: 8.0.2
setuptools: 20.1.1
Cython: None
numpy: 1.10.4
scipy: 0.17.0
statsmodels: None
IPython: None
sphinx: None
patsy: None
dateutil: 2.4.2
pytz: 2015.7
blosc: None
bottleneck: None
tables: None
numexpr: None
matplotlib: 1.5.1
openpyxl: 2.3.2
xlrd: 0.9.4
xlwt: None
xlsxwriter: 0.8.4
lxml: None
bs4: None
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: None
pymysql: None
psycopg2: None
Jinja2: None
None

image

@jreback
Copy link
Contributor

jreback commented Feb 26, 2016

try merge_cells=False

@jreback jreback added the IO Excel read_excel, to_excel label Feb 26, 2016
@jreback
Copy link
Contributor

jreback commented Feb 26, 2016

cc @chris-b1

@ghost
Copy link
Author

ghost commented Feb 26, 2016

merge_cells affects all the cells and not just the "margins".

image

@chris-b1
Copy link
Contributor

Something going awry in the logic here when there's an empty level, I'll take a look.

@jreback jreback added this to the Next Major Release milestone Feb 27, 2016
@ghost
Copy link
Author

ghost commented Feb 27, 2016

Here a picture of when exporting the same dataframe to html:

image

@brandys11
Copy link
Contributor

I have just opened PR for this. The only problem could be, that empty name is changed to Unnamed x_level_y when reading back form excel.

@jreback jreback changed the title pivot_table export merge margin BUG: excel export merge margin May 8, 2016
@jreback
Copy link
Contributor

jreback commented May 8, 2016

I think the expected output above is wrong

In [11]: df
Out[11]: 
            SALARY                      DAYS                     
YEAR          2013   2014           All 2013  2014            All
MONTH           12      1      2          12     1    2          
JOB    NAME                                                      
Employ Mary   23.0  200.0  190.0  413.0  5.0  15.0  5.0  8.333333
Worker Bob    17.0  210.0   80.0  307.0  3.0  10.5  8.0  8.000000
All           40.0  410.0  270.0  720.0  4.0  12.0  6.5  8.142857

This is correct (as pandas prints it in the console)

In [12]: df.index
Out[12]: 
MultiIndex(levels=[[u'All', u'Employ', u'Worker'], [u'', u'Bob', u'Mary']],
           labels=[[1, 2, 0], [2, 1, 0]],
           names=[u'JOB', u'NAME'])

In [13]: df.columns
Out[13]: 
MultiIndex(levels=[[u'DAYS', u'SALARY'], [2013, 2014, u'All'], [1, 2, 12, u'']],
           labels=[[1, 1, 1, 1, 0, 0, 0, 0], [0, 1, 1, 2, 0, 1, 1, 2], [2, 0, 1, 3, 2, 0, 1, 3]],
           names=[None, u'YEAR', u'MONTH'])

@brandys11
Copy link
Contributor

Have you been referring to my PR? If yes, is it ok to change behavior so that empty index names would stay empty instead "Unnamed: %d_level_%d" as it is right now?

@jreback
Copy link
Contributor

jreback commented May 10, 2016

yes I referred to the PR. My point is that the example in the issue is wrong.

@brandys11
Copy link
Contributor

Sorry, but I am still not sure if I fully understand what you are saying. The html table in first comment looks exactly as yours. And the one in the comment from 27th February is wrong, although I get correct output when calling df.to_html().

Running it after applying my PR would yield:

In [11]: excel = pd.read_excel(file, header=[0,1,2], index_col=[0,1])

In [12]: excel
Out[12]: 
            DAYS                               SALARY            
YEAR        2013  2014                     All   2013 2014        All
MONTH         12     1    2 Unnamed: 5_level_2     12    1    2   Unnamed: 9_level_2
JOB    NAME                                                       
Employ Mary    5  15.0  5.0           8.333333     23  200  190   413
Worker Bob     3  10.5  8.0           8.000000     17  210   80   307
All    Bob     4  12.0  6.5           8.142857     40  410  270   720

In [13]: excel.index
Out[13]: 
MultiIndex(levels=[['All', 'Employ', 'Worker'], ['Bob', 'Mary']],
           labels=[[1, 2, 0], [1, 0, 0]],
           names=['JOB', 'NAME'])

In [14]: excel.columns
Out[14]: 
MultiIndex(levels=[['DAYS', 'SALARY'], [2013, 2014, 'All'], [1, 2, 12, 'Unnamed: 5_level_2', 'Unnamed: 9_level_2']],
           labels=[[0, 0, 0, 0, 1, 1, 1, 1], [0, 1, 1, 2, 0, 1, 1, 2], [2, 0, 1, 3, 2, 0, 1, 4]],
           names=[None, 'YEAR', 'MONTH'])

The problem is that "Unnamed" cells are not merged and we get more Indexes on the 3rd level. I have tried to empty strings insted of "Unnamed" cells. Some tests have failed -> test_unnamed_columns as the column names have to be unique. We could count the number of empty cells and use the "Unnamed cells" only if there is more of them or be Ok with having "Unnamed" cells all the time.

Do you have any other sugestions, or should I try implementing one of mentioned?

@jreback
Copy link
Contributor

jreback commented May 11, 2016

@brandys11 The tables are not the same, salary is misplaced in the top table. The top row is 1 column off.

As far as your PR, you have something wrong. You should't have any unnamed columns. This conversation should move to the PR itself, not the issue.

@ghost
Copy link
Author

ghost commented May 11, 2016

Sorry, but I'm not fully understanding. Is it a bug or am I doing something wrong?

@jreback jreback modified the milestones: 0.18.2, Next Major Release Jun 4, 2016
@jreback jreback closed this as completed in 67b72e3 Jun 6, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug IO Excel read_excel, to_excel
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants