BUG: allow insertion/deletion of columns in non-unique column DataFrames #3683

jreback · 2013-05-22T14:44:07Z

Here's example of various operations

In [3]: df = DataFrame([[1,1,1,5],[1,1,2,5],[2,1,3,5]],
             columns=['foo','bar','foo','hello'])

In [27]: df.columns.is_unique
Out[27]: False

In [4]: # insert

In [5]: df['string'] = 'bah'

In [6]: df
Out[6]: 
   foo  bar  foo  hello string
0    1    1    1      5    bah
1    1    1    2      5    bah
2    2    1    3      5    bah

In [7]: # insert same dtype

In [8]: df['foo2'] = 3

In [9]: df
Out[9]: 
   foo  bar  foo  hello string  foo2
0    1    1    1      5    bah     3
1    1    1    2      5    bah     3
2    2    1    3      5    bah     3

In [10]: # delete (non dup)

In [11]: del df['bar']

In [12]: df
Out[12]: 
   foo  foo  hello string  foo2
0    1    1      5    bah     3
1    1    2      5    bah     3
2    2    3      5    bah     3

In [13]: # try to delete again (its not consolidated)

In [14]: del df['hello']

In [15]: df
Out[15]: 
   foo  foo string  foo2
0    1    1    bah     3
1    1    2    bah     3
2    2    3    bah     3

In [16]: # insert

In [17]: df.insert(2,'new_col',5.)

In [18]: df
Out[18]: 
   foo  foo  new_col string  foo2
0    1    1        5    bah     3
1    1    2        5    bah     3
2    2    3        5    bah     3

This is the current default behavior now

In [19]: # insert a dup

In [20]: df.insert(2,'new_col',4.)
Exception: cannot insert new_col, already exists

In [21]: # insert a dup

In [22]: df.insert(2,'new_col',4.,allow_duplicates=True)

In [23]: df
Out[23]: 
   foo  foo  new_col  new_col string  foo2
0    1    1        4        5    bah     3
1    1    2        4        5    bah     3
2    2    3        4        5    bah     3

In [24]: # delete (dup)

In [25]: del df['foo']

In [26]: df
Out[26]: 
   new_col  new_col string  foo2
0        4        5    bah     3
1        4        5    bah     3
2        4        5    bah     3

Don't try this at home

duplicates across dtypes
assigning those duplicates

In [5]: df = DataFrame([[1,1,1.,5],[1,1,2.,5],[2,1,3.,5]],columns=['foo','bar','foo','hello'])

In [6]: df.dtypes
Out[6]: 
foo        int64
bar        int64
foo      float64
hello      int64
dtype: object

In [7]: df['foo'] = 'string'

In [8]: df
Out[8]: 
      foo  bar     foo  hello
0  string    1  string      5
1  string    1  string      5
2  string    1  string      5

hayd · 2013-05-23T11:50:40Z

"A bit non trivial" lol

Is this still WIP? I think the example from above is still present (if I my HEAD is on this correctly)...

jreback · 2013-05-23T13:01:46Z

this works on this branch (not on dev)

hayd · 2013-05-23T14:58:49Z

Weird, #3687 is still not working for me on the branch, tip eb3720f (maybe I'm doing it wrong!).

jreback · 2013-05-23T15:08:04Z

something funny going on....works in this case:
but not in the other.....will fix..thanks

(Pdb) idx = date_range('20130101',periods=4,freq='Q-NOV')
(Pdb) df = DataFrame([[1,1,1,5],[1,1,2,5],[2,1,3,5]],columns=['a','a','a','a'])
(Pdb) df.columns = idx
(Pdb) df
   2013-02-28  2013-05-31  2013-08-31  2013-11-30
0           1           1           1           5
1           1           1           2           5
2           2           1           3           5

jreback · 2013-05-23T15:47:59Z

was an oversite in the constructor, was constructing the _ref_locs when mixed type of in the example I used...., but not in a single-dtype array....will be updated soon

jreback · 2013-05-25T23:29:59Z

@wesm a bit more complicated that I like
but necessary - pls give a once over

CLN: finished duplicate item insert/delete - whoosh! (GH3679) BUG: insert almost working BUG: fixed insertion of dup columns! ENH: added allow_duplicates kw to DataFrame.insert to indicate that inserting of a duplicate column should be allowed (default is False)

…darray ENH: non-unique assignment now works TST: more tests BUG: handle multi-level columns correctly (original method) CLN/BUG: raise exception if block ref_locs are not set when _set_ref_locs

ENH: extend index.reindex to handle non_unique indicies (rather than raising) TST: more tests/optimizations for dup_columns

BUG: allow insertion/deletion of columns in non-unique column DataFrames

jreback mentioned this pull request May 23, 2013

Dataframe non unique column renaming #3687

Closed

hayd mentioned this pull request May 23, 2013

Groupby aggregations could ignore non-numeric columns when axis=1 #3688

Closed

jreback added 3 commits May 29, 2013 20:48

BUG: non-unique column index was failing in construction if from an n…

8bcf581

…darray ENH: non-unique assignment now works TST: more tests BUG: handle multi-level columns correctly (original method) CLN/BUG: raise exception if block ref_locs are not set when _set_ref_locs

BUG: not setting placement on reindex_with_indexers

1c0b105

ENH: extend index.reindex to handle non_unique indicies (rather than raising) TST: more tests/optimizations for dup_columns

jreback added a commit that referenced this pull request May 30, 2013

Merge pull request #3683 from jreback/dup_insert

586a878

BUG: allow insertion/deletion of columns in non-unique column DataFrames

jreback merged commit 586a878 into pandas-dev:master May 30, 2013

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: allow insertion/deletion of columns in non-unique column DataFrames #3683

BUG: allow insertion/deletion of columns in non-unique column DataFrames #3683

jreback commented May 22, 2013

hayd commented May 23, 2013

jreback commented May 23, 2013

hayd commented May 23, 2013

jreback commented May 23, 2013

jreback commented May 23, 2013

jreback commented May 25, 2013

BUG: allow insertion/deletion of columns in non-unique column DataFrames #3683

BUG: allow insertion/deletion of columns in non-unique column DataFrames #3683

Conversation

jreback commented May 22, 2013

hayd commented May 23, 2013

jreback commented May 23, 2013

hayd commented May 23, 2013

jreback commented May 23, 2013

jreback commented May 23, 2013

jreback commented May 25, 2013