Skip to content

BUG: Various inconsistencies in DataFrame __getitem__ and __setitem__ behavior #2765

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
stephenwlin opened this issue Jan 28, 2013 · 0 comments
Labels
Milestone

Comments

@stephenwlin
Copy link
Contributor

setitem fails with row slices supported by getitem
In [27]: df=p.DataFrame(
   ....:              {"alpha": ["a", "b", "c", "d", "e"],
   ....:              "beta":[1,2,3,4,5]},
   ....:              columns=["alpha","beta"],
   ....:              index=[1.0, 2.5, 3, 4.5, 5.0])

In [28]: df[0:1] # does work, slices on rows
Out[28]: 
  alpha  beta
1     a     1

In [29]: df[0:1] = 0 # fails
(exception)
setitem fails on boolean indexing unless key is ndarray, while getitem casts appropriately
In [49]: df=p.DataFrame(
   ....:              {"alpha": ["a", "b", "c", "d", "e"],
   ....:              "beta":[1,2,3,4,5]},
   ....:              columns=["alpha","beta"],
   ....:              index=[1.0, 2.5, 3, 4.5, 5.0])

In [50]: df[np.array([True, False, False, False, True])]
Out[50]: 
  alpha  beta
1     a     1
5     e     5

In [51]: df[[True, False, False, False, True]]
Out[51]: 
  alpha  beta
1     a     1
5     e     5

In [52]: df[np.array([True, False, False, False, True])] = 4 # does work

In [53]: df[[True, False, False, False, True]] = 3 # does not work
(exception)
getitem does not reindex boolean Series key but setitem does
In [54]: df=p.DataFrame(
   ....:              {"alpha": ["a", "b", "c", "d", "e"],
   ....:              "beta":[1,2,3,4,5]},
   ....:              columns=["alpha","beta"],
   ....:              index=[1.0, 2.5, 3, 4.5, 5.0])

In [55]: s=p.Series([True, True, False, False, False],
   ....:              index=[5.0, 4.5, 3, 2.5, 1.0]) # row-reversed index

In [56]: df[s] # gets first two rows (no reindex)
Out[56]: 
    alpha  beta
1.0     a     1
2.5     b     2

In [57]: df[s] = 1 # assigns last two rows (reindexed)

In [58]: df
Out[58]: 
    alpha  beta
1.0     a     1
2.5     b     2
3.0     c     3
4.5     1     1
5.0     1     1
getitem does not allow DataFrame key when columns are MultiIndex, but setitem does
In [60]: c2=p.MultiIndex(levels=[['foo', 'bar', 'baz', 'qux'],
   ....:                        ['one', 'two', 'three']],
   ....:                labels=[[0, 0, 0, 1, 1, 2, 2, 3, 3, 3],
   ....:                        [0, 1, 2, 0, 1, 1, 2, 0, 1, 2]],
   ....:                names=['first', 'second'])

In [61]: df2=p.DataFrame(np.random.randn(3, 10),
   ....:              index=['A','B','C'], columns=c2)

In [62]: df2[df2 > 0] # fails
(exception)

In [63]: df2[df2 > 0] = 4 # ok
setitem and getitem reindex DataFrame keys when shapes differ, but do not reindex when shapes are the same and indexes/columns are different
In [68]: df3=p.DataFrame(np.random.randn(5, 2),
   ....:              index=[1.0, 2.5, 3, 4.5, 5.0])

In [69]: key = (df3 > 0).reindex(df3.index[:-1]) # first four rows

In [70]: df3[key] = 5 # key is reindexed because shape does not match (ok)

In [71]: df3
Out[71]: 
            0         1
1.0 -1.599142  5.000000
2.5  5.000000 -0.180330
3.0 -0.568206  5.000000
4.5 -0.340465 -0.070105
5.0 -0.474955  1.928842

In [72]: key = key.reindex(key.index[::-1]) # reverse rows in key

In [73]: df3[key] = 4 # key is reindexed because shape does not match (ok)

In [74]: df3
Out[74]: 
            0         1
1.0 -1.599142  4.000000
2.5  4.000000 -0.180330
3.0 -0.568206  4.000000
4.5 -0.340465 -0.070105
5.0 -0.474955  1.928842

In [75]: df3=p.DataFrame(np.random.randn(5, 2),
   ....:              index=[1.0, 2.5, 3, 4.5, 5.0])

In [76]: key = (df3 > 0) # all rows

In [77]: df3[key] = 5 # no reindex required

In [78]: df3
Out[78]: 
            0         1
1.0  5.000000 -0.233779
2.5  5.000000 -0.141962
3.0 -0.232551  5.000000
4.5 -1.663034  5.000000
5.0 -0.653200 -0.365681

In [79]: key = key.reindex(key.index[::-1]) # reverse rows in key

In [80]: df3[key] = 4 # key is not reindexed, even though rows are reversed

In [81]: df3
Out[81]: 
            0         1
1.0  5.000000 -0.233779
2.5  5.000000  4.000000
3.0 -0.232551  4.000000
4.5  4.000000  5.000000
5.0  4.000000 -0.365681
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants