Skip to content

Replacing multiple columns (or just one) with iloc does not work #22046

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
mitar opened this issue Jul 24, 2018 · 11 comments · Fixed by #37728
Closed

Replacing multiple columns (or just one) with iloc does not work #22046

mitar opened this issue Jul 24, 2018 · 11 comments · Fixed by #37728
Labels
Bug Indexing Related to indexing on series/frames, not to indexes themselves
Milestone

Comments

@mitar
Copy link
Contributor

mitar commented Jul 24, 2018

Code Sample, a copy-pastable example if possible

import pandas

columns = pandas.DataFrame({'a2': [11, 12, 13], 'b2': [14, 15, 16]})
inputs = pandas.DataFrame({'a1': [1, 2, 3], 'b1': [4, 5, 6], 'c1': [7, 8, 9]})

inputs.iloc[:, [1]] = columns.iloc[:, [0]]

print(inputs)

Problem description

I have a code which is replacing a set of columns with another set of columns, based on column indices. To make things done without a special case, I assumes I could just use iloc to both select and set columns in a DataFrame. But it seems that this not work and fails in strange ways.

Expected Output

   a1  b1  c1
0   1  11   7
1   2  12   8
2   3  13   9

But in reality, you get:

    a1  b1   c1
0  1.0 NaN  7.0
1  2.0 NaN  8.0
2  3.0 NaN  9.0

See how values converted to float and how column is NaNs?

But, if I do the following I get expected results:

inputs.iloc[:, [1]] = [[11], [12], [13]]

This also works:

inputs.iloc[:, [1]] = columns.iloc[:, [0]].values

So if it works with lists and ndarrays, one would assume it would also work with DataFrames themselves. But it does not.

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.6.3.final.0
python-bits: 64
OS: Linux
OS-release: 4.13.0-46-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.23.3
pytest: None
pip: 18.0
setuptools: 40.0.0
Cython: None
numpy: 1.15.0
scipy: None
pyarrow: None
xarray: None
IPython: None
sphinx: None
patsy: None
dateutil: 2.7.3
pytz: 2018.5
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: None
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

@dahlbaek
Copy link
Contributor

dahlbaek commented Jul 25, 2018

I believe the problem is that the column names do not coincide. The following two versions both work on my machine:

import pandas

columns = pandas.DataFrame({'b1': [11, 12, 13], 'b2': [14, 15, 16]})
inputs = pandas.DataFrame({'a1': [1, 2, 3], 'b1': [4, 5, 6], 'c1': [7, 8, 9]})
inputs.iloc[:, [1]] = columns.iloc[:, [0]]
print(inputs)
import pandas

columns = pandas.DataFrame({'a2': [11, 12, 13], 'b2': [14, 15, 16]})
inputs = pandas.DataFrame({'a1': [1, 2, 3], 'b1': [4, 5, 6], 'c1': [7, 8, 9]})
inputs.iloc[:, [1]] = columns.iloc[:, [0]].values
print(inputs)

@dahlbaek
Copy link
Contributor

dahlbaek commented Jul 25, 2018

Check out the big red warning in the docs. Basically, pandas is trying to set the 'b1' column of inputs to the value of the 'b1' column of columns, not finding any data there. Passing in .values forces pandas to take whatever values are passed in the given order.

@mitar
Copy link
Contributor Author

mitar commented Jul 25, 2018

I can understand that this happens with .loc, but how can it happen with .iloc? .iloc should be position-based so it should not care about column names, no?

@dahlbaek
Copy link
Contributor

Here's my understanding: .iloc will locate by position, but it will not discard the column names in its output. Compare

print(columns.iloc[:, [0]])

to

print(columns.iloc[:, [0]].values)

@mitar
Copy link
Contributor Author

mitar commented Jul 25, 2018

Sure, but there is difference between columns.iloc[:, [0]] and columns.iloc[:, [0]] = .... The first is a getter, the second one is a setter. A different method on DataFrame object is called. What I would like to argue in this issue is that it is really surprising to an user that when operating with iloc column names are taken into the account at all. My hope is really that using iloc I can use DataFrame like I am used from numpy, based on indices.

I worry that going through .values is adding unnecessary dtype conversions.

@dahlbaek
Copy link
Contributor

Hmm, so there is really two independent issues here?

  1. Conversions happening in columns that were not selected. Looks like a bug?
  2. Whether setting with .iloc should ignore column names.

@jorisvandenbossche
Copy link
Member

I can understand that this happens with .loc, but how can it happen with .iloc? .iloc should be position-based so it should not care about column names, no?

Yes, in principle iloc should not care about column names, so this certainly seems a bug to me.

@jorisvandenbossche
Copy link
Member

Related issues: #12991

@dahlbaek
Copy link
Contributor

Seems that .iloc works as expected if you pass a single index instead of a length-1 list:

Input:

columns = pandas.DataFrame({'a2': [11, 12, 13], 'b2': [14, 15, 16]})
inputs = pandas.DataFrame({'a1': [1, 2, 3], 'b1': [4, 5, 6], 'c1': [7, 8, 9]})
inputs.iloc[:, 1] = columns.iloc[:, 0]
print(inputs)

Output:

   a1  b1  c1
0   1  11   7
1   2  12   8
2   3  13   9

@mitar
Copy link
Contributor Author

mitar commented Jul 25, 2018

Seems that .iloc works as expected if you pass a single index instead of a length-1 list:

Not really. ;-) But that is another bug: #22036

@mitar
Copy link
Contributor Author

mitar commented Nov 19, 2020

Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Indexing Related to indexing on series/frames, not to indexes themselves
Projects
None yet
5 participants