-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Three or more unnamed fields block loc assignment #13017
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
you need to use specify or use |
or dont' write the index in the first place, e.g. |
I'm aware you can get around it, but it's still a bug |
@JakeCowton how so? pls read the doc-string. It is very clear. You are not reading with a correct option. |
I'm not demanding a fix for it. Like I said in my post, you can work around it and you've provided 2 ways to avoid getting into the situation in the first place; but it is a bug. |
@JakeCowton I still done't understand what you are saying, pls provide a short self-reproducing, copy-pastable example. What you did above is pure usage. |
I suppose you are saying this is a bug.
This looks legitimate, and to be honest is user error if this is the case. |
The actual issue you raised is a bit burried in the long explanation, but I think you wanted to highlight the following:
So the last assignment ( In any case, not using chained assignment works:
Although using
The reason you get duplicate column names here is also due to |
Hi, the problem is when you set index_col=0 in the last line, the column name is still changed to unnamed: 0.1 In [35]: pd.read_csv(StringIO(pd.read_csv(StringIO(df.to_csv())).to_csv()), index_col=0)
I expect it should be:
|
As mentioned in Joris' comment, it appears the "bug" is in a chained indexing operation which is high discouraged and we're not actively looking to support. Agreed that this is a won't fix. Closing but happy to reopen if I misunderstood |
Writing a dataframe to csv using
df.to_csv("/path/to/file.csv")
causes the creation of an "unnamed" field containing the indexes of the rows. Continually writing/reading to/from this file will result in many "unnamed" fields. Once there are three unnamed fields you can no longer use loc to replace values, what's more, it fails silently.So far so good, I was able to replace all values in
df.B
withNaN
. I then write this out and read it back inAs you can see, this has create an unnamed field, but let's continue
This is jsut to get 501 in place of the
NaN
I created earlier which I forgot to do before writing outEverything working fine
Writing and reading again creates a 2nd unnamed field
Which is no problem, everything still works so far...however
We now have 3 unnamed fields
The method of replacing all values over 500 with
Nan
no longer works but also throws no errors or warnings.You CAN get around this using
df.loc[df.B > 500, 'B'] = None
but obviously you shouldn't have to.The text was updated successfully, but these errors were encountered: