BUG: iloc fills multiple columns, if columns have duplicate names #12991

henhuy · 2016-04-26T12:32:23Z

Creating a DataFrame with two columns given duplicate names, changing data via indexer changes both columns:

a = pd.DataFrame(index=['a', 'b', 'c'], columns=['d', 'e', 'd']).fillna(0)
a.iloc[:, 0]['a'] = 3

Gives:
d e d
a 3 0 3
b 0 0 0
c 0 0 0

Instead, it should only edit first column:
d e d
a 3 0 0
b 0 0 0
c 0 0 0

INSTALLED VERSIONS

commit: None
python: 3.4.3.final.0
python-bits: 64
OS: Linux
OS-release: 3.13.0-68-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: de_DE.UTF-8

pandas: 0.17.1
nose: 1.3.6
pip: 1.5.4
setuptools: 3.3
Cython: None
numpy: 1.10.1
scipy: 0.13.3
statsmodels: None
IPython: 3.1.0
sphinx: 1.3.1
patsy: None
dateutil: 2.4.2
pytz: 2015.7
blosc: None
bottleneck: None
tables: 3.1.1
numexpr: 2.2.2
matplotlib: 1.3.1
openpyxl: None
xlrd: 0.9.3
xlwt: 0.8.0
xlsxwriter: 0.7.7
lxml: 3.3.3
bs4: 4.2.1
html5lib: 0.999
httplib2: 0.8
apiclient: None
sqlalchemy: 1.0.4
pymysql: None
psycopg2: 2.6.1 (dt dec pq3 ext lo64)
Jinja2: None

jreback · 2016-04-26T13:02:31Z

This is a tricky bug, though you are chained indexing here, so no guarantees.

Instead do this.

In [7]: a.iloc[a.index.get_loc('a'), 0] = 3

In [8]: a
Out[8]: 
   d  e  d
a  3  0  0
b  0  0  0
c  0  0  0

henhuy · 2016-04-26T16:26:46Z

That's what I did right now - thanks!

toobaz · 2017-03-14T16:15:54Z

A "more licit" example of the bug in action:

In [4]: df1 = DataFrame([{'A': None, 'B': 1}, {'A': 2, 'B': 2}])
   ...: df2 = DataFrame([{'A': 3, 'B': 3}, {'A': 4, 'B': 4}])
   ...: df = concat([df1, df2], axis=1)
   ...: df.iloc[0, 0] = 15
   ...: df
   ...: 
Out[4]: 
      A  B     A  B
0  15.0  1  15.0  3
1   2.0  2   4.0  4

toobaz · 2017-08-09T21:06:28Z

A "more licit" example of the bug in action:

For reference: it is actually not the same bug, since it is fixed by #17163 , while the original example by the submitter is not.

I still think, however, that there is some way to expose the (original) bug without chained assigning.

MohakGangwani · 2020-05-31T20:17:26Z

Wouldn't it be better if pandas didn't allow us to have duplicate column names ? Having columns with same name would rather create confusion for us.

toobaz · 2020-05-31T21:08:28Z

Having columns with same name would rather create confusion for us.

Indeed, not having duplicate column names is a good idea.

Enforcing this in pandas (and doing it consistently - that is, also on rows) would break a lot of code.

mitar · 2020-05-31T22:09:42Z

would break a lot of code.

But breaking it loudly and consistently is maybe better than breaking some of it silently.

toobaz · 2020-05-31T22:43:55Z

But breaking it loudly and consistently is maybe better than breaking some of it silently.

Indeed, the safest way to break buggy code loudly and consistently is, always, to just break all code.

mitar · 2020-05-31T22:50:38Z

Not having duplicate column names would also help with sklearn, which is planing to introduce features operating on pandas and column names, but require column names to be unique to work. So having such limit in pandas dataframes would not make you be surprised when you pass it to sklearn and it complains.

toobaz · 2020-05-31T23:19:46Z

sklearn, which is planing to introduce features operating on pandas and column names, but require column names to be unique to work

For sklearn devs, checking if an index is unique is as simple as checking .is_unique.

For sklearn users, without context I don't know if what you describe is a legitimate need or just an implementation limit, but in any case, it will just have to be documented in sklearn and shouldn't affect the many non-sklearn users.

In any case, I think this is vastly off topic here, as this is a very specific bug caused by a code path which should be rewritten anyway.

jreback added Bug Indexing Related to indexing on series/frames, not to indexes themselves Difficulty Advanced labels Apr 26, 2016

jreback added this to the Next Major Release milestone Apr 26, 2016

toobaz mentioned this issue Mar 14, 2017

BUG: .iloc indexing with duplicates #15686

Closed

toobaz mentioned this issue Mar 22, 2017

Fix scalar iloc #15778

Closed

4 tasks

jorisvandenbossche mentioned this issue Jul 25, 2018

Replacing multiple columns (or just one) with iloc does not work #22046

Closed

jbrockmendel removed Effort Low labels Oct 21, 2019

MohakGangwani mentioned this issue Jun 1, 2020

Update : In DataFrame Class, Columns should not contain duplicate val… #34512

Closed

mroeschke removed this from the Contributions Welcome milestone Oct 13, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: iloc fills multiple columns, if columns have duplicate names #12991

BUG: iloc fills multiple columns, if columns have duplicate names #12991

henhuy commented Apr 26, 2016 •

edited by jbrockmendel

Loading

jreback commented Apr 26, 2016

henhuy commented Apr 26, 2016 •

edited

Loading

toobaz commented Mar 14, 2017 •

edited

Loading

toobaz commented Aug 9, 2017

MohakGangwani commented May 31, 2020

toobaz commented May 31, 2020

mitar commented May 31, 2020

toobaz commented May 31, 2020

mitar commented May 31, 2020

toobaz commented May 31, 2020

BUG: iloc fills multiple columns, if columns have duplicate names #12991

BUG: iloc fills multiple columns, if columns have duplicate names #12991

Comments

henhuy commented Apr 26, 2016 • edited by jbrockmendel Loading

INSTALLED VERSIONS

jreback commented Apr 26, 2016

henhuy commented Apr 26, 2016 • edited Loading

toobaz commented Mar 14, 2017 • edited Loading

toobaz commented Aug 9, 2017

MohakGangwani commented May 31, 2020

toobaz commented May 31, 2020

mitar commented May 31, 2020

toobaz commented May 31, 2020

mitar commented May 31, 2020

toobaz commented May 31, 2020

henhuy commented Apr 26, 2016 •

edited by jbrockmendel

Loading

henhuy commented Apr 26, 2016 •

edited

Loading

toobaz commented Mar 14, 2017 •

edited

Loading