Skip to content

Issue with Excel writers when column names are duplicated #5235

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jmcnamara opened this issue Oct 15, 2013 · 7 comments
Closed

Issue with Excel writers when column names are duplicated #5235

jmcnamara opened this issue Oct 15, 2013 · 7 comments
Labels
Bug IO Excel read_excel, to_excel
Milestone

Comments

@jmcnamara
Copy link
Contributor

There appears to be an issue with Excel writers when DataFrame column names are duplicated. This issue that was initially reported on StackOverflow.

For example consider the following program:

import pandas as pd
from pandas import DataFrame

df = DataFrame([[1, 2, 3], [1, 2, 3], [1, 2, 3]])

df.columns = ['A', 'B', 'B']  # !!!

df.to_csv('output.csv')
df.to_excel('output.xlsx')

Note the duplicated column name. The df for this looks like this:

>>> df
   A  B  B
0  1  2  3
1  1  2  3
2  1  2  3

The corresponding output of the CSV is as expected:

$ cat output.csv
,A,B,B
0,1,2,3
1,1,2,3
2,1,2,3

However, the output of the any of the Excel writers is incorrect:

screenshot

The issue appears to be in pandas/core/format.py. The output data is gathered based on column names, as shown below, which causes issues with duplicate names.

    def _format_regular_rows(self):
        ...
        for colidx, colname in enumerate(self.columns):
            series = self.df[colname]
            ... 

I initially thought that this might be the correct behaviour and that column names shouldn't be duplicated but given that the output is different to the csv writer it looks like a bug.

I'll write a test case but I'm not sure of the best way to fix the issue.

@jtratner
Copy link
Contributor

@jreback - I feel like you've dealt with this recently - how do you handle
it?

@jreback
Copy link
Contributor

jreback commented Oct 15, 2013

easy

when I iterate thru columns do this

for i, col in enumerate(obj.columns):
obj.iloc[:, i]

rather than using iteritems

@jtratner
Copy link
Contributor

There you go - I figured you'd have an easy answer :)

@jmcnamara
Copy link
Contributor Author

I'll try the iloc fix with the test case. WIP.

@jmcnamara
Copy link
Contributor Author

Output after fix looks good:

screenshot

@jreback
Copy link
Contributor

jreback commented Oct 15, 2013

gr8

@jreback
Copy link
Contributor

jreback commented Oct 16, 2013

closed by #5237

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug IO Excel read_excel, to_excel
Projects
None yet
Development

No branches or pull requests

3 participants