Skip to content

BUG: allow insertion/deletion of columns in non-unique column DataFrames #3683

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
May 30, 2013

Conversation

jreback
Copy link
Contributor

@jreback jreback commented May 22, 2013

closes #3679, #3687

Here's example of various operations

In [3]: df = DataFrame([[1,1,1,5],[1,1,2,5],[2,1,3,5]],
             columns=['foo','bar','foo','hello'])

In [27]: df.columns.is_unique
Out[27]: False

In [4]: # insert

In [5]: df['string'] = 'bah'

In [6]: df
Out[6]: 
   foo  bar  foo  hello string
0    1    1    1      5    bah
1    1    1    2      5    bah
2    2    1    3      5    bah

In [7]: # insert same dtype

In [8]: df['foo2'] = 3

In [9]: df
Out[9]: 
   foo  bar  foo  hello string  foo2
0    1    1    1      5    bah     3
1    1    1    2      5    bah     3
2    2    1    3      5    bah     3

In [10]: # delete (non dup)

In [11]: del df['bar']

In [12]: df
Out[12]: 
   foo  foo  hello string  foo2
0    1    1      5    bah     3
1    1    2      5    bah     3
2    2    3      5    bah     3

In [13]: # try to delete again (its not consolidated)

In [14]: del df['hello']

In [15]: df
Out[15]: 
   foo  foo string  foo2
0    1    1    bah     3
1    1    2    bah     3
2    2    3    bah     3

In [16]: # insert

In [17]: df.insert(2,'new_col',5.)

In [18]: df
Out[18]: 
   foo  foo  new_col string  foo2
0    1    1        5    bah     3
1    1    2        5    bah     3
2    2    3        5    bah     3

This is the current default behavior now

In [19]: # insert a dup

In [20]: df.insert(2,'new_col',4.)
Exception: cannot insert new_col, already exists

In [21]: # insert a dup

In [22]: df.insert(2,'new_col',4.,allow_duplicates=True)

In [23]: df
Out[23]: 
   foo  foo  new_col  new_col string  foo2
0    1    1        4        5    bah     3
1    1    2        4        5    bah     3
2    2    3        4        5    bah     3
In [24]: # delete (dup)

In [25]: del df['foo']

In [26]: df
Out[26]: 
   new_col  new_col string  foo2
0        4        5    bah     3
1        4        5    bah     3
2        4        5    bah     3

Don't try this at home

  1. duplicates across dtypes
  2. assigning those duplicates
In [5]: df = DataFrame([[1,1,1.,5],[1,1,2.,5],[2,1,3.,5]],columns=['foo','bar','foo','hello'])

In [6]: df.dtypes
Out[6]: 
foo        int64
bar        int64
foo      float64
hello      int64
dtype: object

In [7]: df['foo'] = 'string'

In [8]: df
Out[8]: 
      foo  bar     foo  hello
0  string    1  string      5
1  string    1  string      5
2  string    1  string      5

@hayd
Copy link
Contributor

hayd commented May 23, 2013

"A bit non trivial" lol

Is this still WIP? I think the example from above is still present (if I my HEAD is on this correctly)...

@jreback
Copy link
Contributor Author

jreback commented May 23, 2013

this works on this branch (not on dev)

@hayd
Copy link
Contributor

hayd commented May 23, 2013

Weird, #3687 is still not working for me on the branch, tip eb3720f (maybe I'm doing it wrong!).

@jreback
Copy link
Contributor Author

jreback commented May 23, 2013

something funny going on....works in this case:
but not in the other.....will fix..thanks

(Pdb) idx = date_range('20130101',periods=4,freq='Q-NOV')
(Pdb) df = DataFrame([[1,1,1,5],[1,1,2,5],[2,1,3,5]],columns=['a','a','a','a'])
(Pdb) df.columns = idx
(Pdb) df
   2013-02-28  2013-05-31  2013-08-31  2013-11-30
0           1           1           1           5
1           1           1           2           5
2           2           1           3           5

@jreback
Copy link
Contributor Author

jreback commented May 23, 2013

was an oversite in the constructor, was constructing the _ref_locs when mixed type of in the example I used...., but not in a single-dtype array....will be updated soon

@jreback
Copy link
Contributor Author

jreback commented May 25, 2013

@wesm a bit more complicated that I like
but necessary - pls give a once over

jreback added 3 commits May 29, 2013 20:48
CLN: finished duplicate item insert/delete - whoosh! (GH3679)

BUG: insert almost working

BUG: fixed insertion of dup columns!

ENH: added allow_duplicates kw to DataFrame.insert to indicate that inserting of

     a duplicate column should be allowed (default is False)
…darray

ENH: non-unique assignment now works

TST: more tests

BUG: handle multi-level columns correctly (original method)

CLN/BUG: raise exception if block ref_locs are not set when _set_ref_locs
ENH: extend index.reindex to handle non_unique indicies (rather than raising)

TST: more tests/optimizations for dup_columns
jreback added a commit that referenced this pull request May 30, 2013
BUG: allow insertion/deletion of columns in non-unique column DataFrames
@jreback jreback merged commit 586a878 into pandas-dev:master May 30, 2013
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: cannot insert/delete to non-unique columns
2 participants