Skip to content

Strange behavior assigning values to elements #3970

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
phobson opened this issue Jun 20, 2013 · 13 comments
Closed

Strange behavior assigning values to elements #3970

phobson opened this issue Jun 20, 2013 · 13 comments
Labels
Bug Indexing Related to indexing on series/frames, not to indexes themselves
Milestone

Comments

@phobson
Copy link

phobson commented Jun 20, 2013

Basically, if you create a Data

import pandas as pd

df = pd.DataFrame({ "aa":range(5), "bb":[2.2]*5})

df["cc"] = 0.0 # remove this line, it works

ck = [True]*len(df)

df["bb"].iloc[0] = .13
df_tmp = df.iloc[ck] # or remove this line and it'll work
df["bb"].iloc[0] = .15 # doesn't happen

print df

which gives:

   aa    bb  cc
0   0  0.13   0
1   1  2.20   0
2   2  2.20   0
3   3  2.20   0
4   4  2.20   0

Very strange.

Specs:

pd.version.version
Out[5]: '0.11.0'

np.version.version
Out[7]: '1.7.1'
@jreback
Copy link
Contributor

jreback commented Jun 20, 2013

I responded to you e-mail..repro here

this is not a bug; you are modifying a copy (which is caused by the dtype change)
use loc like this

http://pandas.pydata.org/pandas-docs/dev/indexing.html#indexing-view-versus-copy

In [11]: df.loc[0,'bb'] = .13

In [12]: df_tmp = df[ck]

In [13]: df.loc[0,'bb'] = .15

In [14]: df
Out[14]: 
   aa    bb
0   0  0.15
1   1  2.20
2   2  2.20
3   3  2.20
4   4  2.20
5   5  2.20
6   6  2.20
7   7  2.20
8   8  2.20
9   9  2.20

@phobson
Copy link
Author

phobson commented Jun 20, 2013

I'm confused. Where does the dtype change?

I do agree that this behaves as expected

import pandas as pd
df = pd.DataFrame({ "aa":range(5), "bb":[2.2]*5})
df["cc"] = 0.0 
ck = [True]*len(df)
df.loc[0, 'bb'] = .13
df_tmp = df.iloc[ck] 
df.loc[0, 'bb'] = .15 
print df

But how is this not inconsistent behavior?

import pandas as pd
df = pd.DataFrame({ "aa":range(5), "bb":[2.2]*5})
df["cc"] = 0.0 
ck = [True]*len(df)
df["bb"].loc[0] = .13 # works
df_tmp = df.iloc[ck] 
df["bb"].loc[0] = .15 # doesn't work
print df

In other words, why can I change an element once, but not a second time?

@jreback
Copy link
Contributor

jreback commented Jun 20, 2013

sorry was thinking about another issue, dtype is not a problem here

this
df["bb"].loc[0] = .13

creates a copy (so you assign to the copy rather than the frame)
note that this does not always create a copy (has to do whether this is view or not,
and unfortunately that is numpy defined)

this is why a multi-axes assignment should use all axes in a single loc/iloc

@jreback
Copy link
Contributor

jreback commented Jun 20, 2013

df_tmp = df.iloc[ck] this triggers a copy of the underlying data (and it is no longer a view), not exactly sure why

@cpcloud
Copy link
Member

cpcloud commented Jun 20, 2013

doesn't indexing with a sequence (or any other non slice, non tuple object that is a valid index) trigger a copy? it does with numpy

@jreback
Copy link
Contributor

jreback commented Jun 20, 2013

the old advanced/fancy vs basic, could be

@cpcloud
Copy link
Member

cpcloud commented Jun 20, 2013

arg there's no explanation for why fancy indexing requires a copy. maybe it's obvious but i don't see it

@jreback
Copy link
Contributor

jreback commented Jun 20, 2013

I think its really related to how memory is laid out, what you are doing, etc. can it be done in an efficient manner and so for, basically implementation dependent.

@cpcloud
Copy link
Member

cpcloud commented Jun 20, 2013

probably is because in general fancy indexing must follow the rule that any operation leading to an irregularly strided array must return a copy. in come cases (ones that are equivalent to slicing you could have views, but there's the overhead of checking whether there's an equivalent slice to the passed numpy array)

@wesm
Copy link
Member

wesm commented Jun 20, 2013

Looks buggy to me, marked as such and labeled for 0.11.1

@jreback
Copy link
Contributor

jreback commented Jun 20, 2013

ill take a look

@johannh-zz
Copy link

Here's another good one. If I look at df["bb"] after this, I see the change (on 0.11):

In [21]: print df
   aa    bb  cc
0   0  0.13   0
1   1  2.20   0
2   2  2.20   0
...

In [22]: df["bb"]
Out[22]:
0    0.15
1    2.20
2    2.20
3    2.20
...

wesm added a commit to wesm/pandas that referenced this issue Jun 28, 2013
…mixed_type silently consolidating (hurf). also fix stable sorting bug presenting on my machine
@jreback
Copy link
Contributor

jreback commented Jun 28, 2013

closed by #4077

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Indexing Related to indexing on series/frames, not to indexes themselves
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants