Allow duplicate columns in df.to_csv #3095

ghost · 2013-03-19T19:11:25Z

Continuing #3059.
See also #3092

allow dupe columns when they are in the same block/dtype
Perhaps figure out a way to handle that case as well.

jreback · 2013-03-19T19:27:56Z

why don't push to 0.12? (if its really an issue, use can pass legacy=True)

ghost · 2013-03-19T19:37:53Z

it's a regression. 10.1 handles this and the default to_csv in 0.11 doesn't.
legacy = True is just there so no one uses it.

jreback · 2013-03-19T19:45:50Z

ok....you can take out the fail early and just put a try except on the chunk writer then....it will still catch the dup across blocks, but won't fail on the dup in a single block (it will fail because the colnamemap will have None as an indexer rather than a value)

jreback · 2013-03-19T19:49:11Z

is there a single dtype dup columns test? I can pu the above fix in...

ghost · 2013-03-19T19:49:13Z

Can you give me a recipe for generating blocks so ordered traversal doesn't match column order?

ghost · 2013-03-19T19:53:27Z

In [11]: df=mkdf(10,5)
    ...: df['j'] = pd.Series(range(len(df)))
    ...: df['k']= pd.Series(map(float,range(len(df))))
    ...: df['l']= pd.Series(map(str,range(len(df))))
    ...: df._consolidate_inplace()
    ...: #df.columns = ['a'] * len(df.columns)
    ...: bs=df._data.blocks
    ...: bs[0]=df._data.blocks[0]

In [12]: bs
Out[12]: 
[FloatBlock: [j, k], 2 x 10, dtype float64,
 ObjectBlock: [C_l0_g0, C_l0_g1, C_l0_g2, C_l0_g3, C_l0_g4, l], 6 x 10, dtype object]

In [9]: df
Out[9]: 
C0      C_l0_g0 C_l0_g1 C_l0_g2 C_l0_g3 C_l0_g4   j   k    l
R0                                                          
R_l0_g0    R0C0    R0C1    R0C2    R0C3    R0C4 NaN NaN  NaN
R_l0_g1    R1C0    R1C1    R1C2    R1C3    R1C4 NaN NaN  NaN
R_l0_g2    R2C0    R2C1    R2C2    R2C3    R2C4 NaN NaN  NaN
R_l0_g3    R3C0    R3C1    R3C2    R3C3    R3C4 NaN NaN  NaN
R_l0_g4    R4C0    R4C1    R4C2    R4C3    R4C4 NaN NaN  NaN
R_l0_g5    R5C0    R5C1    R5C2    R5C3    R5C4 NaN NaN  NaN
R_l0_g6    R6C0    R6C1    R6C2    R6C3    R6C4 NaN NaN  NaN
R_l0_g7    R7C0    R7C1    R7C2    R7C3    R7C4 NaN NaN  NaN
R_l0_g8    R8C0    R8C1    R8C2    R8C3    R8C4 NaN NaN  NaN
R_l0_g9    R9C0    R9C1    R9C2    R9C3    R9C4 NaN NaN  NaN

ghost · 2013-03-19T20:26:02Z

I don't think there is. I've got it going, if there are dupe columns it falls back to using icol
and if the dupes are split across blocks, icol dies with an exception as described.
almost there.

ghost · 2013-03-19T20:27:56Z

len(union of set(keys of each block)) == sum(len(set(keys of b)) for b in blocks)

jreback · 2013-03-19T21:11:24Z

ok great
I was trying to fix a more general duplication issue
so will drop what I was dooing

ghost · 2013-03-19T21:20:38Z

Ok, fixed in master.
when dupes are present CSVWriter falls back to icol() rather then walking the blocks.
When/if icol() learns to deal with dupes split across blocks, df.to_csv() should work
for that case too.

jreback · 2013-03-19T21:44:14Z

great, that looks good; i'll leave the other issue open about creating a duplicate indexer that can handle this more general case (but don't hold thy breath)

jreback · 2013-04-30T18:18:06Z

ok...reopening this so I remember to do it (or @y-p if you want)..

jreback · 2013-04-30T18:18:36Z

actually..let me create another one in 0.11.1

ghost mentioned this issue Mar 19, 2013

ENH: improve performance of df.to_csv GH3054 #3059

Merged

jreback mentioned this issue Mar 19, 2013

ENH: create BlockManager positional indexer (for easier dupe cols support) #3092

Closed

ghost closed this as completed in 1f138a4 Mar 19, 2013

ghost mentioned this issue Apr 30, 2013

BUG: GH3468 Fix assigning a new index to a duplicate index in a DataFrame would fail #3483

Closed

jreback reopened this Apr 30, 2013

jreback closed this as completed Apr 30, 2013

jreback mentioned this issue Apr 30, 2013

BUG: fix to_csv to work with dup column indices #3495

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow duplicate columns in df.to_csv #3095

Allow duplicate columns in df.to_csv #3095

ghost commented Mar 19, 2013

jreback commented Mar 19, 2013

ghost commented Mar 19, 2013

jreback commented Mar 19, 2013

jreback commented Mar 19, 2013

ghost commented Mar 19, 2013

ghost commented Mar 19, 2013

ghost commented Mar 19, 2013

ghost commented Mar 19, 2013

jreback commented Mar 19, 2013

ghost commented Mar 19, 2013

jreback commented Mar 19, 2013

jreback commented Apr 30, 2013

jreback commented Apr 30, 2013

Allow duplicate columns in df.to_csv #3095

Allow duplicate columns in df.to_csv #3095

Comments

ghost commented Mar 19, 2013

jreback commented Mar 19, 2013

ghost commented Mar 19, 2013

jreback commented Mar 19, 2013

jreback commented Mar 19, 2013

ghost commented Mar 19, 2013

ghost commented Mar 19, 2013

ghost commented Mar 19, 2013

ghost commented Mar 19, 2013

jreback commented Mar 19, 2013

ghost commented Mar 19, 2013

jreback commented Mar 19, 2013

jreback commented Apr 30, 2013

jreback commented Apr 30, 2013