Skip to content

DOC: Expand View vs copy in the docs #4183

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
hayd opened this issue Jul 9, 2013 · 13 comments
Closed

DOC: Expand View vs copy in the docs #4183

hayd opened this issue Jul 9, 2013 · 13 comments
Labels
Milestone

Comments

@hayd
Copy link
Contributor

hayd commented Jul 9, 2013

At the moment it's a common source of confusion as to what returns a view vs a copy. The docs are fairly light on this atm.
Need to expand this section with lots of example, and update with what now returns a view.

I've said before to @jreback I'd do this but tbh, am not fully confident on the ins and outs so perhaps we can post a few examples here to get started.

@hayd
Copy link
Contributor Author

hayd commented Jul 9, 2013

for lists row and columns this produces a view, same for iloc, ix:

df.loc[rows, columns] 

as do these

df[columns]
df.T.loc[columns, rows]

order matters example

use update for boolean indexing

@jreback
Copy link
Contributor

jreback commented Jul 9, 2013

df.loc[rows,columns] = value is always safe, it does produce a view, but the setitem handles this so it doesn't actually matter

you only really get in trouble with a chained setter e.g.

df.loc[:,['A','B']].iloc[0] = value DOES not work, nor should it

the key point is that the assignment should in one call

@hayd
Copy link
Contributor Author

hayd commented Jul 9, 2013

Do you mean copy?

I wonder if this also warrants a cookbook section (I guess mainly for the update thing).

@jreback
Copy link
Contributor

jreback commented Jul 9, 2013

no...what I mean is that when you do a setter it sets the correct object, it doesn't matter if its a view/copy because you never see a returned object anyways.

df.loc[rows,columns] will in general return a copy (except or the special case of C-order and rows = :, I think. It doesn't matter though. The only methods pandas supports for setting are direct methods, anything else is not really supported anyhow (e.g. the chained method of setting, will sometimes work but is definitly not supported nor expected to work)

@hayd
Copy link
Contributor Author

hayd commented Jul 9, 2013

wowza, so boolean indexing does work now provided you use that:

df[mask] = value

@jreback
Copy link
Contributor

jreback commented Jul 9, 2013

In [3]: df = DataFrame(np.arange(10).reshape(5,2),index=range(5),columns=['A','B'],dtype='float64')

In [4]: df
Out[4]: 
   A  B
0  0  1
1  2  3
2  4  5
3  6  7
4  8  9

In [10]: df.loc[df.A>4,['A']] = -df

In [11]: df
Out[11]: 
   A  B
0  0  1
1  2  3
2  4  5
3 -6  7
4 -8  9
In [12]: df = DataFrame(np.arange(10).reshape(5,2),index=range(5),columns=['A','B'],dtype='float64')

In [13]: df[df.A>4] = -df

In [14]: df
Out[14]: 
   A  B
0  0  1
1  2  3
2  4  5
3 -6 -7
4 -8 -9

@jreback
Copy link
Contributor

jreback commented Jul 9, 2013

boolean indexing with df[mask] = value works, but it is usually not 'precise' enough....almost always want to limit the columns and

df[['columns']][mask] = value is a chained expression so DOESN't work

but you CAN (and should) use .loc for this purpose

@hayd
Copy link
Contributor Author

hayd commented Jul 9, 2013

In [10]: df[df == 2] = 99

In [11]: df
Out[11]:
    A  B
0   0  1
1  99  3
2   4  5
3   6  7
4   8  9

@hayd
Copy link
Contributor Author

hayd commented Jul 9, 2013

I thought you couldn't use loc with a mask ? Oh wait I see, you use boolean index/column.

Am I missing something:

In [12]: df.loc[[0], df == 2]
Out[12]:
   A  B
0  0  1

@jreback
Copy link
Contributor

jreback commented Jul 9, 2013

the columns are aligned to that df == 2 is just ['A','B'] there....equiv to
df.loc[[0]]

iloc with a mask is a problem, in this case the integer labels are just that, labels

@hayd
Copy link
Contributor Author

hayd commented Jul 9, 2013

So to use with loc they should be column_mask and row_mask, if that makes sense.

I remember now (about iloc and boolean indexing).

Also creating link to #4192

@jreback
Copy link
Contributor

jreback commented Jul 9, 2013

yep (mask or labels are ok)

@jreback jreback modified the milestones: 0.15.0, 0.14.0 Feb 20, 2014
@jreback
Copy link
Contributor

jreback commented Mar 3, 2015

this is pretty good now

@jreback jreback closed this as completed Mar 3, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants