Replacing multiple columns (or just one) with iloc does not work #22046

mitar · 2018-07-24T22:58:31Z

Code Sample, a copy-pastable example if possible

import pandas

columns = pandas.DataFrame({'a2': [11, 12, 13], 'b2': [14, 15, 16]})
inputs = pandas.DataFrame({'a1': [1, 2, 3], 'b1': [4, 5, 6], 'c1': [7, 8, 9]})

inputs.iloc[:, [1]] = columns.iloc[:, [0]]

print(inputs)

Problem description

I have a code which is replacing a set of columns with another set of columns, based on column indices. To make things done without a special case, I assumes I could just use iloc to both select and set columns in a DataFrame. But it seems that this not work and fails in strange ways.

Expected Output

   a1  b1  c1
0   1  11   7
1   2  12   8
2   3  13   9

But in reality, you get:

    a1  b1   c1
0  1.0 NaN  7.0
1  2.0 NaN  8.0
2  3.0 NaN  9.0

See how values converted to float and how column is NaNs?

But, if I do the following I get expected results:

inputs.iloc[:, [1]] = [[11], [12], [13]]

This also works:

inputs.iloc[:, [1]] = columns.iloc[:, [0]].values

So if it works with lists and ndarrays, one would assume it would also work with DataFrames themselves. But it does not.

Output of `pd.show_versions()`

INSTALLED VERSIONS

commit: None
python: 3.6.3.final.0
python-bits: 64
OS: Linux
OS-release: 4.13.0-46-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.23.3
pytest: None
pip: 18.0
setuptools: 40.0.0
Cython: None
numpy: 1.15.0
scipy: None
pyarrow: None
xarray: None
IPython: None
sphinx: None
patsy: None
dateutil: 2.7.3
pytz: 2018.5
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: None
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

The text was updated successfully, but these errors were encountered:

dahlbaek · 2018-07-25T09:35:44Z

I believe the problem is that the column names do not coincide. The following two versions both work on my machine:

import pandas

columns = pandas.DataFrame({'b1': [11, 12, 13], 'b2': [14, 15, 16]})
inputs = pandas.DataFrame({'a1': [1, 2, 3], 'b1': [4, 5, 6], 'c1': [7, 8, 9]})
inputs.iloc[:, [1]] = columns.iloc[:, [0]]
print(inputs)

import pandas

columns = pandas.DataFrame({'a2': [11, 12, 13], 'b2': [14, 15, 16]})
inputs = pandas.DataFrame({'a1': [1, 2, 3], 'b1': [4, 5, 6], 'c1': [7, 8, 9]})
inputs.iloc[:, [1]] = columns.iloc[:, [0]].values
print(inputs)

dahlbaek · 2018-07-25T09:41:05Z

Check out the big red warning in the docs. Basically, pandas is trying to set the 'b1' column of inputs to the value of the 'b1' column of columns, not finding any data there. Passing in .values forces pandas to take whatever values are passed in the given order.

mitar · 2018-07-25T09:54:21Z

I can understand that this happens with .loc, but how can it happen with .iloc? .iloc should be position-based so it should not care about column names, no?

dahlbaek · 2018-07-25T10:47:39Z

Here's my understanding: .iloc will locate by position, but it will not discard the column names in its output. Compare

print(columns.iloc[:, [0]])

to

print(columns.iloc[:, [0]].values)

mitar · 2018-07-25T10:54:43Z

Sure, but there is difference between columns.iloc[:, [0]] and columns.iloc[:, [0]] = .... The first is a getter, the second one is a setter. A different method on DataFrame object is called. What I would like to argue in this issue is that it is really surprising to an user that when operating with iloc column names are taken into the account at all. My hope is really that using iloc I can use DataFrame like I am used from numpy, based on indices.

I worry that going through .values is adding unnecessary dtype conversions.

dahlbaek · 2018-07-25T11:18:44Z

Hmm, so there is really two independent issues here?

Conversions happening in columns that were not selected. Looks like a bug?
Whether setting with .iloc should ignore column names.

jorisvandenbossche · 2018-07-25T11:59:15Z

I can understand that this happens with .loc, but how can it happen with .iloc? .iloc should be position-based so it should not care about column names, no?

Yes, in principle iloc should not care about column names, so this certainly seems a bug to me.

jorisvandenbossche · 2018-07-25T12:02:54Z

Related issues: #12991

dahlbaek · 2018-07-25T19:14:45Z

Seems that .iloc works as expected if you pass a single index instead of a length-1 list:

Input:

columns = pandas.DataFrame({'a2': [11, 12, 13], 'b2': [14, 15, 16]})
inputs = pandas.DataFrame({'a1': [1, 2, 3], 'b1': [4, 5, 6], 'c1': [7, 8, 9]})
inputs.iloc[:, 1] = columns.iloc[:, 0]
print(inputs)

Output:

   a1  b1  c1
0   1  11   7
1   2  12   8
2   3  13   9

mitar · 2018-07-25T19:50:25Z

Seems that .iloc works as expected if you pass a single index instead of a length-1 list:

Not really. ;-) But that is another bug: #22036

mitar · 2020-11-19T08:46:11Z

Thanks.

mroeschke added Bug Indexing Related to indexing on series/frames, not to indexes themselves labels Jan 13, 2019

This was referenced May 2, 2020

BUG: DataFrameGroupby std/sem modify grouped column when as_index=False #33630

Merged

WIP: BUG: Setting DataFrame values via iloc aligns when arguments are lists #33949

Closed

phofl mentioned this issue Nov 10, 2020

Bug in iloc aligned objects #37728

Merged

6 tasks

jreback added this to the 1.2 milestone Nov 18, 2020

jreback closed this as completed in #37728 Nov 19, 2020

rhshadrach mentioned this issue Nov 24, 2020

CLN: Remove .values from groupby.sem #38044

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replacing multiple columns (or just one) with iloc does not work #22046

Replacing multiple columns (or just one) with iloc does not work #22046

mitar commented Jul 24, 2018

INSTALLED VERSIONS

dahlbaek commented Jul 25, 2018 •

edited

Loading

dahlbaek commented Jul 25, 2018 •

edited

Loading

mitar commented Jul 25, 2018

dahlbaek commented Jul 25, 2018

mitar commented Jul 25, 2018

dahlbaek commented Jul 25, 2018

jorisvandenbossche commented Jul 25, 2018

jorisvandenbossche commented Jul 25, 2018

dahlbaek commented Jul 25, 2018

mitar commented Jul 25, 2018

mitar commented Nov 19, 2020

Replacing multiple columns (or just one) with iloc does not work #22046

Replacing multiple columns (or just one) with iloc does not work #22046

Comments

mitar commented Jul 24, 2018

Code Sample, a copy-pastable example if possible

Problem description

Expected Output

Output of pd.show_versions()

INSTALLED VERSIONS

dahlbaek commented Jul 25, 2018 • edited Loading

dahlbaek commented Jul 25, 2018 • edited Loading

mitar commented Jul 25, 2018

dahlbaek commented Jul 25, 2018

mitar commented Jul 25, 2018

dahlbaek commented Jul 25, 2018

jorisvandenbossche commented Jul 25, 2018

jorisvandenbossche commented Jul 25, 2018

dahlbaek commented Jul 25, 2018

mitar commented Jul 25, 2018

mitar commented Nov 19, 2020

Output of `pd.show_versions()`

dahlbaek commented Jul 25, 2018 •

edited

Loading

dahlbaek commented Jul 25, 2018 •

edited

Loading