Confusing behavior with (multi-)assignment and _LocIndexer/_IXIndexer #12947

DavidEscott · 2016-04-21T16:49:02Z

df = pandas.DataFrame([[1,2,3,4,5]], columns=["A", "B", "C", "D", "E"])
# Suppose you want to set df.B to df.C when df.A ==1
# then the following both work:
df.loc[df.A== 1, "B"] = df.loc[df.A == 1, "C"]
df.ix[df.A == 1, "B"] = df.ix[df.A == 1, "C"]
# you can even mix and match them with ix on one side and loc on the other

# but maybe you have two or more columns you want to set... Its natural to think that:
df.loc[df.A== 1, ["B", "C"]] = df.loc[df.A == 1, ["D", "E"]]
df.ix[df.A == 1, ["B", "C"]] = df.ix[df.A == 1, ["D", "E"]]
# but they actually just NaN out df.B and df.C (it isn't an issue of a silent copy losing updates)
# in fact the application of NaN even happens if you have singletons
df.loc[df.A== 1, ["D"]] = df.loc[df.A == 1, ["E"]]

# presumably because
type(df.ix[df.A == 1, "B"])
# is pandas.core.series.Series but 
type(df.ix[df.A == 1, ["B"]])
# is pandas.core.frame.DataFrame
# but when printed they look really similar... 
#0    3
# Name: B, dtype: int64
# versus
#    B
#0  3
# so it is easy to get confused

# If this can't be made to work in the natural fashion it would be a lot nicer if it could just throw an error
# the same way the following does:
df.ix[df.A == 1, ["B", "C"]] = df[df.A == 1]["D", "E"]

INSTALLED VERSIONS

commit: None
python: 3.5.1.final.0
python-bits: 64
OS: Linux
OS-release: 2.6.32-431.29.2.el6.x86_64
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8

pandas: 0.18.0
nose: 1.3.7
pip: 8.1.1
setuptools: 20.4
Cython: None
numpy: 1.11.0
scipy: None
statsmodels: None
xarray: None
IPython: 4.1.2
sphinx: None
patsy: None
dateutil: 2.5.2
pytz: 2016.3
blosc: None
bottleneck: None
tables: None
numexpr: None
matplotlib: None
openpyxl: 2.3.4
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: 1.0.12
pymysql: None
psycopg2: None
jinja2: None
boto: None

jreback · 2016-04-21T17:02:50Z

you are missing the point here, when you use multiple columns, pandas will align for you. so you need to give it a raw array/list if you are doing this.

In [29]: df.loc[df.A== 1, ["B", "C"]] = df.loc[df.A == 1, ["D", "E"]].values

In [30]: df
Out[30]: 
   A  B  C  D  E
0  1  4  5  4  5

jreback · 2016-04-21T17:03:19Z

I suppose you could do a warning section in the docs. interested in that?

DavidEscott · 2016-04-21T17:34:31Z

I don't follow at all. Here is a little more strangeness:

In [42]: df = pandas.DataFrame([[1,2,3,4,5]], columns=["A", "B", "C", "D", "E"])

In [43]: type(df[["B","C"]])
Out[43]: pandas.core.frame.DataFrame

In [44]: type(df.loc[df.A==1, ["B","C"]])
Out[44]: pandas.core.frame.DataFrame

In [45]: df[["B", "C"]] = df[["D", "E"]]

In [46]: df
Out[46]:
   A  B  C  D  E
0  1  4  5  4  5

So I can assign a DataFrame to another DataFrame (of compatible dimension just fine)
UNLESS one is a .loc or .ix of the other (and then stuff gets nulled out).

I don't understand the NaNs at all. LHS=RHS shouldn't result in LHS being None when RHS is not None. That doesn't sound like correct behavior at all.

Another weird thing that happens:

In [93]: df = pandas.DataFrame([[1,2,3,4,5]], columns=["A", "B", "C", "D", "E"])

In [94]: df2 = df.loc[:,["B","C"]]

In [95]: df3 = df.loc[:,["D","E"]]

In [96]: df2.loc[:,:] is df2
Out[96]: True

In [97]: df2.loc[:,:] = df3

In [98]: df2
Out[98]:
    B   C
0 NaN NaN

In [99]: df
Out[99]:
   A  B  C  D  E
0  1  2  3  4  5

but since df2.loc[:,:] is df2 this should be equivalent to: df.loc[:,["B","C"]] = df3 which of course we have seen is not the case.

Therefore with Pandas X.foo().bar() is not the same thing as _ = X.foo(); _.bar(). That is something I find super scary.

jreback · 2016-04-21T17:51:17Z

you are doing 2 different things, in [45] you are saying take these columns and assign to these, this ignores alignment because its a column asssignment.

while above in my [29] you are assigning part of a frame, this is a conceptual difference and as expected.

jreback added the Usage Question label Apr 21, 2016

jreback added Indexing Related to indexing on series/frames, not to indexes themselves Docs Difficulty Novice labels Apr 21, 2016

jreback added this to the 0.18.2 milestone Apr 21, 2016

This was referenced May 3, 2016

DOC: add warning section in indexing docs. #13060

Closed

DOC: add warning section in indexing docs #13070

Closed

jreback closed this as completed in 40ba6eb May 7, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Confusing behavior with (multi-)assignment and _LocIndexer/_IXIndexer #12947

Confusing behavior with (multi-)assignment and _LocIndexer/_IXIndexer #12947

DavidEscott commented Apr 21, 2016 •

edited

Loading

jreback commented Apr 21, 2016

jreback commented Apr 21, 2016

DavidEscott commented Apr 21, 2016 •

edited

Loading

jreback commented Apr 21, 2016

Confusing behavior with (multi-)assignment and _LocIndexer/_IXIndexer #12947

Confusing behavior with (multi-)assignment and _LocIndexer/_IXIndexer #12947

Comments

DavidEscott commented Apr 21, 2016 • edited Loading

INSTALLED VERSIONS

jreback commented Apr 21, 2016

jreback commented Apr 21, 2016

DavidEscott commented Apr 21, 2016 • edited Loading

jreback commented Apr 21, 2016

DavidEscott commented Apr 21, 2016 •

edited

Loading

DavidEscott commented Apr 21, 2016 •

edited

Loading