BUG: iloc can create columns #6766

bergtholdt · 2014-04-02T14:41:17Z

After a concat of two DataFrames with the same columns. I want to consolidate some data and remove NaNs in some columns by values in other columns. I ended up with a DataFrame that magically had additional columns.

This is the minimum example that I can give to reproduce the faulty behaviour using current master (70de129):

df1 = pd.DataFrame([{'A':None, 'B':1},{'A':2, 'B':2}])
df2 = pd.DataFrame([{'A':3, 'B':3},{'A':4, 'B':4}])
df = pd.concat([df1, df2], axis=1)

>>> df1
    A  B
0 NaN  1
1   2  2

[2 rows x 2 columns]

>>> df2
   A  B
0  3  3
1  4  4

[2 rows x 2 columns]

>>> df
    A  B  A  B
0 NaN  1  3  3
1   2  2  4  4

[2 rows x 4 columns]

Now replacing NaNs in the 0 column with (corresponding) values in the 2 column ('A'), I expected to simply write a 3 into NaN (which it did), but it actually added a column '0' at the end of the DataFrame even though iloc is not supposed to enlarge the dataset. Clearly a bug.

inds = np.isnan(df.iloc[:, 0])
df.iloc[:, 0][inds] = df.iloc[:, 2][inds]

>>> df
   A  B  A  B  0
0  3  1  3  3  3
1  2  2  4  4  2

[2 rows x 5 columns]

The text was updated successfully, but these errors were encountered:

jreback · 2014-04-02T14:50:37Z

its a bug, but not for the reason you suggest.

doing ANYTHING like

df.iloc[:,0][inds] IS a chained assignment and should ALWAYS be avoided, so I wouldn't expect this to work in any event.

see here: http://pandas-docs.github.io/pandas-docs-travis/indexing.html#indexing-view-versus-copy

Further using duplicate columns is very tricky and should generally be avoided.

This is a bug because this should work:

In [39]: mask = inds[inds].index

In [40]: df.iloc[mask,0] = df.iloc[mask,2]
AssertionError: Cannot create BlockManager._ref_locs because block [FloatBlock: [A], 1 x 2, dtype: float64] with duplicate items [Index([u'A', u'A'], dtype='object')] does not have _ref_locs set

bergtholdt · 2014-04-02T15:17:56Z

Hi,

thanks for the quick reply. I actually tested your suggested solution first since it would be the intuitive way to do it. Actually I first tried

df.iloc[inds, 0] = ...

which raised a NotImplementedError:

NotImplementedError: iLocation based boolean indexing on an integer type is not available

Then this similar to yours

indexes = inds.nonzero()[0]
df.iloc[indexes, 0] = ...

and got the same error as you did (note also with scalars on the right hand side).

With trail and error I got the version at the top running in an older version of pandas, but current master then started creating these extra columns (though it also wrote the values at the proper location).
Also note that I run with

pd.set_option('mode.chained_assignment', 'raise')

And did not get an error for the version at the top.

jreback · 2014-04-02T15:23:44Z

we have to 'guess' if something is chained as python syntax does not allow it to be detected. so its not an error that its not raised (SettingWIthCopy), but just hard to figure out.

iloc specific does NOT take a boolean indexer, but only an integer one. (on purpose).

ix is the soln here, but breaks for the same reason

df.ix[inds,2] = df.ix[inds,0] should work as well.

NEVER do chained assignment it is just not a good idea (if this is had been a single dtype it WOULD have worked), in a multi-dtype case it will also SOMETIMES work.

bergtholdt · 2014-04-02T15:49:02Z

Just another comment that might be related. After creating the DataFrame with the multiple column here at the top, I also get a ValueError when doing a simple indexing like:

>>> df.iloc[0,0]

>>> df.iloc[0,:]

Both raise

ValueError: Wrong number of items passed 8, index implies 4
in C:\x64\Python27\lib\site-packages\pandas-0.13.1_550_g9039338-py2.7-win-amd64.egg\pandas\core\internals.pyc:64

Whereas this works

>>> df.iloc[:,0]

jreback · 2014-04-04T14:04:07Z

ok...these getitem issues with iloc (namely), df.iloc[0,0] and df.iloc[0,:] when the frame is created via a concat is fixed

the setting is a bit more complicated

immerrr · 2014-04-15T08:50:42Z

iloc specific does NOT take a boolean indexer, but only an integer one. (on purpose).

Weird, I've always thought of iloc as "numpy-like" rather than "strictly-integer" indexer and I'd expect it work like np.ndarray get-/setitem methods. Performance- or implementation-complexity-wise, is there a reason to force users to route boolean indexers via loc?

jreback · 2014-04-15T11:18:03Z

the reason this was deliberately not done was because a boolean indexer normally requires alignment
which is a label based operation

alignment is not really possible in a logical sense

for example say you want to align a timeseries index vs an integer index

doesn't make sense

immerrr · 2014-04-15T11:25:00Z

a boolean indexer normally requires alignment which is a label based operation

That is if the indexer is Series, what if it is an ndarray?

jreback · 2014-04-15T11:29:28Z

that should work but iirc that was taken out to make it less confusing

@hayd do u remember this?

@immerrr you might look back at the original iloc issue
it was in 0.11 (and was pretty long)

immerrr · 2014-04-15T11:55:35Z

I've checked on current master: iloc[ndarray] works, loc[Series] works, so does loc[ndarray], only iloc[Series] doesn't, that indeed makes sense. And speaking of the issue at hand, this worked for me:

In [21]: inds = np.isnan(df.iloc[:, 0])

In [22]: inds
Out[22]: 
0     True
1    False
Name: a, dtype: bool

In [23]: inds.values
Out[23]: array([ True, False], dtype=bool)

In [24]: df.iloc[inds.values, 0]
Out[24]: 
0   NaN
Name: a, dtype: float64

jreback · 2014-04-15T12:24:07Z

#6799 fixed the last

Setitem in a duplicate frame with iloc is still not working

hayd · 2014-04-15T20:49:39Z

IIRC I was pro iloc working with masks, I think if the dtype is bool this is not ambiguous (currently this is the only reason I use ix!). I'm not sure I understand the argument re-alignment.

jreback · 2014-04-30T00:44:34Z

@immerrr your refactor seemed to have fixed this, thanks!

jreback added Bug labels Apr 2, 2014

jreback added this to the 0.14.0 milestone Apr 2, 2014

jreback mentioned this issue Apr 4, 2014

BUG: duplicate (getitem) indexing with iloc (GH6766) #6799

Merged

jreback mentioned this issue Apr 30, 2014

BUG: duplicate indexing with setitem with iloc (GH6766) #7006

Merged

jreback closed this as completed in #7006 Apr 30, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

BUG: iloc can create columns #6766

BUG: iloc can create columns #6766

bergtholdt commented Apr 2, 2014

jreback commented Apr 2, 2014

Uh oh!

bergtholdt commented Apr 2, 2014

Uh oh!

jreback commented Apr 2, 2014

Uh oh!

bergtholdt commented Apr 2, 2014

Uh oh!

jreback commented Apr 4, 2014

Uh oh!

immerrr commented Apr 15, 2014

Uh oh!

jreback commented Apr 15, 2014

Uh oh!

immerrr commented Apr 15, 2014

Uh oh!

jreback commented Apr 15, 2014

Uh oh!

immerrr commented Apr 15, 2014

Uh oh!

jreback commented Apr 15, 2014

Uh oh!

hayd commented Apr 15, 2014

Uh oh!

jreback commented Apr 30, 2014

Uh oh!

Uh oh!

BUG: iloc can create columns #6766

BUG: iloc can create columns #6766

Comments

bergtholdt commented Apr 2, 2014

jreback commented Apr 2, 2014

Uh oh!

bergtholdt commented Apr 2, 2014

Uh oh!

jreback commented Apr 2, 2014

Uh oh!

bergtholdt commented Apr 2, 2014

Uh oh!

jreback commented Apr 4, 2014

Uh oh!

immerrr commented Apr 15, 2014

Uh oh!

jreback commented Apr 15, 2014

Uh oh!

immerrr commented Apr 15, 2014

Uh oh!

jreback commented Apr 15, 2014

Uh oh!

immerrr commented Apr 15, 2014

Uh oh!

jreback commented Apr 15, 2014

Uh oh!

hayd commented Apr 15, 2014

Uh oh!

jreback commented Apr 30, 2014

Uh oh!