Skip to content

Allow duplicate columns in df.to_csv #3095

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
ghost opened this issue Mar 19, 2013 · 13 comments
Closed

Allow duplicate columns in df.to_csv #3095

ghost opened this issue Mar 19, 2013 · 13 comments
Milestone

Comments

@ghost
Copy link

ghost commented Mar 19, 2013

Continuing #3059.
See also #3092

  • allow dupe columns when they are in the same block/dtype
  • Perhaps figure out a way to handle that case as well.
@jreback
Copy link
Contributor

jreback commented Mar 19, 2013

why don't push to 0.12? (if its really an issue, use can pass legacy=True)

@ghost
Copy link
Author

ghost commented Mar 19, 2013

it's a regression. 10.1 handles this and the default to_csv in 0.11 doesn't.
legacy = True is just there so no one uses it.

@jreback
Copy link
Contributor

jreback commented Mar 19, 2013

ok....you can take out the fail early and just put a try except on the chunk writer then....it will still catch the dup across blocks, but won't fail on the dup in a single block (it will fail because the colnamemap will have None as an indexer rather than a value)

@jreback
Copy link
Contributor

jreback commented Mar 19, 2013

is there a single dtype dup columns test? I can pu the above fix in...

@ghost
Copy link
Author

ghost commented Mar 19, 2013

Can you give me a recipe for generating blocks so ordered traversal doesn't match column order?

@ghost
Copy link
Author

ghost commented Mar 19, 2013

In [11]: df=mkdf(10,5)
    ...: df['j'] = pd.Series(range(len(df)))
    ...: df['k']= pd.Series(map(float,range(len(df))))
    ...: df['l']= pd.Series(map(str,range(len(df))))
    ...: df._consolidate_inplace()
    ...: #df.columns = ['a'] * len(df.columns)
    ...: bs=df._data.blocks
    ...: bs[0]=df._data.blocks[0]

In [12]: bs
Out[12]: 
[FloatBlock: [j, k], 2 x 10, dtype float64,
 ObjectBlock: [C_l0_g0, C_l0_g1, C_l0_g2, C_l0_g3, C_l0_g4, l], 6 x 10, dtype object]

In [9]: df
Out[9]: 
C0      C_l0_g0 C_l0_g1 C_l0_g2 C_l0_g3 C_l0_g4   j   k    l
R0                                                          
R_l0_g0    R0C0    R0C1    R0C2    R0C3    R0C4 NaN NaN  NaN
R_l0_g1    R1C0    R1C1    R1C2    R1C3    R1C4 NaN NaN  NaN
R_l0_g2    R2C0    R2C1    R2C2    R2C3    R2C4 NaN NaN  NaN
R_l0_g3    R3C0    R3C1    R3C2    R3C3    R3C4 NaN NaN  NaN
R_l0_g4    R4C0    R4C1    R4C2    R4C3    R4C4 NaN NaN  NaN
R_l0_g5    R5C0    R5C1    R5C2    R5C3    R5C4 NaN NaN  NaN
R_l0_g6    R6C0    R6C1    R6C2    R6C3    R6C4 NaN NaN  NaN
R_l0_g7    R7C0    R7C1    R7C2    R7C3    R7C4 NaN NaN  NaN
R_l0_g8    R8C0    R8C1    R8C2    R8C3    R8C4 NaN NaN  NaN
R_l0_g9    R9C0    R9C1    R9C2    R9C3    R9C4 NaN NaN  NaN

@ghost
Copy link
Author

ghost commented Mar 19, 2013

I don't think there is. I've got it going, if there are dupe columns it falls back to using icol
and if the dupes are split across blocks, icol dies with an exception as described.
almost there.

@ghost
Copy link
Author

ghost commented Mar 19, 2013

len(union of set(keys of each block)) == sum(len(set(keys of b)) for b in blocks)

@jreback
Copy link
Contributor

jreback commented Mar 19, 2013

ok great
I was trying to fix a more general duplication issue
so will drop what I was dooing

@ghost ghost closed this as completed in 1f138a4 Mar 19, 2013
@ghost
Copy link
Author

ghost commented Mar 19, 2013

Ok, fixed in master.
when dupes are present CSVWriter falls back to icol() rather then walking the blocks.
When/if icol() learns to deal with dupes split across blocks, df.to_csv() should work
for that case too.

@jreback
Copy link
Contributor

jreback commented Mar 19, 2013

great, that looks good; i'll leave the other issue open about creating a duplicate indexer that can handle this more general case (but don't hold thy breath)

@jreback
Copy link
Contributor

jreback commented Apr 30, 2013

ok...reopening this so I remember to do it (or @y-p if you want)..

@jreback jreback reopened this Apr 30, 2013
@jreback
Copy link
Contributor

jreback commented Apr 30, 2013

actually..let me create another one in 0.11.1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant