-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: Assigning dictionary to series using .loc produces random results. #47216
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Thanks for the report @rhug123. Is there any documentation suggesting assignment using dictionaries like this is supported? In general I believe setting assumes a scalar or list-like. I took a look but couldn't find any, and in general it seems there isn't much documentation on setting with loc. I'm going to mark this as an enhancement for now until there is some indication it should be supported, in which case it can be changed to a bug. |
Hello @rhshadrach, thank you for taking the time to look at this. The documentation states under "indexing and selecting data - attribute access (https://pandas.pydata.org/docs/user_guide/indexing.html#attribute-access): "You can also assign a dict to a row of a DataFrame:"
While I understand that in the potential bug I have submitted I am looking at assigning a dictionary to a column, as the documentation is referring to a row, I was more concerned with the different behaviors that were observed when running the same code. |
Thanks for the link to the documentation; I do think that updating a row does not imply that you can update a column in necessarily similar fashion. While it is tempting to think of "columns are just rows after a transpose", there is an inherent asymmetry with how pandas stores and works with rows vs columns (e.g. dtypes).
This makes sense, and I'm guessing it's due to the block not being consolidated, but if you're using an API in ways that are unsupported then all bets are off. I think a discussion needs to happen on whether the op here should be supported. |
The point with assigning dicts to rows is that the keys are matched with the column names. I am not sure if this makes much sense for columns, because you could simply use a Series |
Had a chance to take a look. This is a bug, because aligning the dict for the column case was simply forgotten, additionally, the multi-block case already works:
returns
|
Pandas version checks
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
Issue Description
The values that are assigned to columns A are the keys of the dictionary instead of the values.
If however, instead of assigning the dictionary to an existing column, if we create a new column, we will get the same result the first time we run it, but running the same code again will get the expected result. We then are able to select any column and it will return the expected output.
First time running
df.loc[:,'E'] = {0:10,1:20,2:30,3:40}
Output:
Second time running
df.loc[:,'E'] = {0:10,1:20,2:30,3:40}
Then if we run the same code as we did at first, we get a different result:
df.loc[:,'A'] = {0:10,1:20,2:30,3:40}
Output:
Expected Behavior
d = {0:10,1:20,2:30,3:40}
df.loc[:,'A'] = d
Output:
Installed Versions
1.4.2
The text was updated successfully, but these errors were encountered: