Skip to content

inconsistent behaviour with .ix and setting values #2997

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jankatins opened this issue Mar 9, 2013 · 8 comments
Closed

inconsistent behaviour with .ix and setting values #2997

jankatins opened this issue Mar 9, 2013 · 8 comments
Labels
Indexing Related to indexing on series/frames, not to indexes themselves
Milestone

Comments

@jankatins
Copy link
Contributor

Using .ix twice and assigning a value as in df.ix[...].ix[...] = b works in the first case and not in the second:

>>> # from http://stackoverflow.com/questions/15200598/partial-update-to-dataframe-with-multi-index-index-with-integer-labels/15213525#15213525
>>> print df
  bucket  start  stop                 date  x1  x2  x3
0     B1      1     1  2000-10-03 00:00:00   2   2   3
1     B1      1     1  2000-01-04 00:00:00   4   3   3
2     B1      1     2  2000-01-03 00:00:00   4   2   3
3     B1      1     2  2000-01-04 00:00:00   6   2   2

>>> df2 = df.set_index(['bucket','start','stop'])

>>> print df2
                                  date  x1  x2  x3
bucket start stop                                 
B1     1     1     2000-10-03 00:00:00   2   2   3
             1     2000-01-04 00:00:00   4   3   3
             2     2000-01-03 00:00:00   4   2   3
             2     2000-01-04 00:00:00   6   2   2

>>> df2.ix[('B1',1,2)].ix[:,'x1'] = 5 #works

>>> print df2
                                  date  x1  x2  x3
bucket start stop                                 
B1     1     1     2000-10-03 00:00:00   2   2   3
             1     2000-01-04 00:00:00   4   3   3
             2     2000-01-03 00:00:00   5   2   3
             2     2000-01-04 00:00:00   5   2   2

>>> df3 = pd.DataFrame(data={"z":[1,2,3,4],"a":["a","b","c","d"],"b":["X","Y","X","Y"], "c":[1,2,3,4],"d":[5,6,7,8]})

>>> df4 = df3.set_index(["z","a"])

>>> print df4
     b  c  d
z a         
1 a  X  1  5
2 b  Y  2  6
3 c  X  3  7
4 d  Y  4  8

>>> df4.ix[1:2,"b":].ix[:,"d"] = 7 # does not work

>>> print df4
     b  c  d
z a         
1 a  X  1  5
2 b  Y  2  6
3 c  X  3  7
4 d  Y  4  8
@jreback
Copy link
Contributor

jreback commented Mar 9, 2013

see returning a view vs copy
http://pandas.pydata.org/pandas-docs/dev/indexing.html#advanced-indexing-with-ix
(try for iloc/loc as well)

your first operation is returning a copy, the set is in the copy

in general if u are selecting a non-contiguous range (depends on exactly how numpy stores the data) you might get a copy

@jankatins
Copy link
Contributor Author

jikes, seems really too much black magic for my taste:

df4.ix[1:2].ix[:,"d"] = 7 #works 
df4.ix[1:2,"b":] = 8 #works ->  .ix setting directly on the df4?
df4.ix[1:2,"b":].ix[:,"d"] = 9 #works not...

Wouldn't it be more consistent to use copy for every .ix/.iloc/.loc case?

df5 = df4.ix[1:2]
df5["d"] = 10
df4 #boom

@jreback
Copy link
Contributor

jreback commented Mar 9, 2013

chaining these call is only necesary in this case #2995
.ix tries to do the right thing, but its pretty non-trivial whether numpy returns a copy or not

and you do need .ix to be able to set in the multi-axis case (e.g. df5['d'] is not that flexible)

is there a case you really need and cannot do some other way?

@lodagro
Copy link
Contributor

lodagro commented Mar 9, 2013

View or copy, it may seem like a box of chocolates ...

I had in mind that when using label and/or slices a view is returned. Both df4.ix[1:2,"b":] and df4.ix[:,"d"] return a view (i beg to differ here with @jreback), however chaining two such calls seem to return a copy .. huuu?

In [49]: df4
Out[49]: 
     b  c  d
z a         
1 a  X  1  5
2 b  Y  2  6
3 c  X  3  7
4 d  Y  4  8

In [50]: df4.ix[1:2,"b":] = 0

In [51]: df4
Out[51]: 
     b  c  d
z a         
1 a  0  0  0
2 b  0  0  0
3 c  X  3  7
4 d  Y  4  8

In [52]: df4.ix[:,"d"] = 1

In [53]: df4
Out[53]: 
     b  c  d
z a         
1 a  0  0  1
2 b  0  0  1
3 c  X  3  1
4 d  Y  4  1

In [54]: df4.ix[1:2,"b":].ix[:,"d"] = 100   # chaining both does not change df4 ???

In [55]: df4
Out[55]: 
     b  c  d
z a         
1 a  0  0  1
2 b  0  0  1
3 c  X  3  1
4 d  Y  4  1

As already mentioned, chaining is not necessary here.

In [67]: df4
Out[67]: 
     b  c  d
z a         
1 a  X  1  5
2 b  Y  2  6
3 c  X  3  7
4 d  Y  4  8

In [68]: df4.ix[1:2, "d"] = 9

In [69]: df4
Out[69]: 
     b  c  d
z a         
1 a  X  1  9
2 b  Y  2  9
3 c  X  3  7
4 d  Y  4  8

btw @JanSchulz, have you thought about using xs on DataFrames with MultiIndex (see also my reply on your SO question).

@jankatins
Copy link
Contributor Author

@lodagro: I think df4.ix[1:2,"b":] is using .ix directly to set the value to the df4 DataFrame, so no "returning a view or copy" here. The problem starts when this staement is used as a selction, a copy is returned and then ix is used again to set a value on the copy.

@jreback actually the "you don't know whether a view or copy is returned" is the problem for more than the double .ix case, as demonstrated by the assignment of a selection to a new variable: in one case changing a value in such a new dataframe changes the value in the original dataframe and in once case not.

something along the line of

  1. develop your code, use ix with a with the real selection criteria and sampling the other axis -> ix returns a copy, tests are ok
  2. remove the sampling, -> .ix starts to return a view, your original data is compromised

I would argue that .ix and every other selection mechanism should always return a copy.

@lodagro
Copy link
Contributor

lodagro commented Mar 9, 2013

2013/3/9 JanSchulz [email protected]

@lodagro https://github.com/lodagro: I think df4.ix[1:2,"b":] is using
.ix directly to set the value to the df4 DataFrame, so no "returning a
view or copy" here. The problem starts when this staement is used as a
selction, a copy is returned and then ix is used again to set a value on
the copy.

aha, of course

@jreback https://github.com/jreback actually the "you don't know
whether a view or copy is returned" is the problem for more than the double
.ix case, as demonstrated by the assignment of a selection to a new
variable: in one case changing a value in such a new dataframe changes the
value in the original dataframe and in once case not.

something along the line of

  1. develop your code, use ix with a with the real selection criteria and
    sampling the other axis -> ix returns a copy, tests are ok
  2. remove the sampling, -> .ix starts to return a view, your original data
    is compromised

I would argue that .ix should always return a copy.


Reply to this email directly or view it on GitHubhttps://github.com//issues/2997#issuecomment-14670685
.

@stephenwlin
Copy link
Contributor

I would argue that .ix and every other selection mechanism should always return a copy.

That would kill performance in cases when you're selecting individual indices and contiguous slices in a read-only manner, though, which are the most common use cases for .ix (right now these are O(1) operations).

The only reasonable way to guarantee consistent semantics everywhere without completely sacrificing performance would be to implement some kind of view-but-copy-on-first-write wrapper over ndarray, which would be somewhat complicated, especially since ndarray objects can be manipulated using low-level buffer protocol (you'd have to find some way of reliably intersecting all mutating operations through such a protocol without too many false positives, but buffer access is used both in read-only and write operations).

@jreback
Copy link
Contributor

jreback commented Sep 26, 2013

reslved by docs (and use of weak refs to update the caches)

@jreback jreback closed this as completed Sep 26, 2013
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Indexing Related to indexing on series/frames, not to indexes themselves
Projects
None yet
Development

No branches or pull requests

4 participants