-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
BUG: iloc can create columns #6766
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
its a bug, but not for the reason you suggest. doing ANYTHING like
see here: http://pandas-docs.github.io/pandas-docs-travis/indexing.html#indexing-view-versus-copy Further using duplicate columns is very tricky and should generally be avoided. This is a bug because this should work:
|
Hi, thanks for the quick reply. I actually tested your suggested solution first since it would be the intuitive way to do it. Actually I first tried
which raised a NotImplementedError:
Then this similar to yours
and got the same error as you did (note also with scalars on the right hand side). With trail and error I got the version at the top running in an older version of pandas, but current master then started creating these extra columns (though it also wrote the values at the proper location).
And did not get an error for the version at the top. |
we have to 'guess' if something is chained as python syntax does not allow it to be detected. so its not an error that its not raised (SettingWIthCopy), but just hard to figure out.
NEVER do chained assignment it is just not a good idea (if this is had been a single dtype it WOULD have worked), in a multi-dtype case it will also SOMETIMES work. |
Just another comment that might be related. After creating the DataFrame with the multiple column here at the top, I also get a ValueError when doing a simple indexing like:
Both raise
Whereas this works
|
ok...these getitem issues with iloc (namely), the setting is a bit more complicated |
Weird, I've always thought of iloc as "numpy-like" rather than "strictly-integer" indexer and I'd expect it work like np.ndarray get-/setitem methods. Performance- or implementation-complexity-wise, is there a reason to force users to route boolean indexers via loc? |
the reason this was deliberately not done was because a boolean indexer normally requires alignment alignment is not really possible in a logical sense for example say you want to align a timeseries index vs an integer index doesn't make sense |
That is if the indexer is Series, what if it is an ndarray? |
I've checked on current master: In [21]: inds = np.isnan(df.iloc[:, 0])
In [22]: inds
Out[22]:
0 True
1 False
Name: a, dtype: bool
In [23]: inds.values
Out[23]: array([ True, False], dtype=bool)
In [24]: df.iloc[inds.values, 0]
Out[24]:
0 NaN
Name: a, dtype: float64 |
#6799 fixed the last Setitem in a duplicate frame with iloc is still not working |
IIRC I was pro iloc working with masks, I think if the dtype is bool this is not ambiguous (currently this is the only reason I use ix!). I'm not sure I understand the argument re-alignment. |
@immerrr your refactor seemed to have fixed this, thanks! |
After a concat of two DataFrames with the same columns. I want to consolidate some data and remove NaNs in some columns by values in other columns. I ended up with a DataFrame that magically had additional columns.
This is the minimum example that I can give to reproduce the faulty behaviour using current master (70de129):
Now replacing NaNs in the 0 column with (corresponding) values in the 2 column ('A'), I expected to simply write a 3 into NaN (which it did), but it actually added a column '0' at the end of the DataFrame even though iloc is not supposed to enlarge the dataset. Clearly a bug.
The text was updated successfully, but these errors were encountered: