Issue with Excel writers when column names are duplicated #5235

jmcnamara · 2013-10-15T22:19:45Z

There appears to be an issue with Excel writers when DataFrame column names are duplicated. This issue that was initially reported on StackOverflow.

For example consider the following program:

import pandas as pd
from pandas import DataFrame

df = DataFrame([[1, 2, 3], [1, 2, 3], [1, 2, 3]])

df.columns = ['A', 'B', 'B']  # !!!

df.to_csv('output.csv')
df.to_excel('output.xlsx')

Note the duplicated column name. The df for this looks like this:

The corresponding output of the CSV is as expected:

$ cat output.csv
,A,B,B
0,1,2,3
1,1,2,3
2,1,2,3

However, the output of the any of the Excel writers is incorrect:

The issue appears to be in pandas/core/format.py. The output data is gathered based on column names, as shown below, which causes issues with duplicate names.

    def _format_regular_rows(self):
        ...
        for colidx, colname in enumerate(self.columns):
            series = self.df[colname]
            ...

I initially thought that this might be the correct behaviour and that column names shouldn't be duplicated but given that the output is different to the csv writer it looks like a bug.

I'll write a test case but I'm not sure of the best way to fix the issue.

The text was updated successfully, but these errors were encountered:

jtratner · 2013-10-15T22:25:45Z

@jreback - I feel like you've dealt with this recently - how do you handle
it?

jreback · 2013-10-15T22:41:16Z

easy

when I iterate thru columns do this

for i, col in enumerate(obj.columns):
obj.iloc[:, i]

rather than using iteritems

jtratner · 2013-10-15T22:54:03Z

There you go - I figured you'd have an easy answer :)

jmcnamara · 2013-10-15T23:11:57Z

I'll try the iloc fix with the test case. WIP.

jmcnamara · 2013-10-15T23:24:14Z

Output after fix looks good:

jreback · 2013-10-15T23:26:54Z

gr8

jreback · 2013-10-16T18:50:39Z

closed by #5237

jmcnamara mentioned this issue Oct 15, 2013

BUG/TST: Fix Excel writers with duplicated column names. #5237

Merged

jreback closed this as completed Oct 16, 2013

This was referenced Nov 4, 2013

BUG: Excel writer doesn't handle "cols" option correctly #5427

Closed

BUG: Excel writer doesn't handle "cols" option correctly #5429

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issue with Excel writers when column names are duplicated #5235

Issue with Excel writers when column names are duplicated #5235

jmcnamara commented Oct 15, 2013

jtratner commented Oct 15, 2013

jreback commented Oct 15, 2013

jtratner commented Oct 15, 2013

jmcnamara commented Oct 15, 2013

jmcnamara commented Oct 15, 2013

jreback commented Oct 15, 2013

jreback commented Oct 16, 2013

Issue with Excel writers when column names are duplicated #5235

Issue with Excel writers when column names are duplicated #5235

Comments

jmcnamara commented Oct 15, 2013

jtratner commented Oct 15, 2013

jreback commented Oct 15, 2013

jtratner commented Oct 15, 2013

jmcnamara commented Oct 15, 2013

jmcnamara commented Oct 15, 2013

jreback commented Oct 15, 2013

jreback commented Oct 16, 2013