Skip to content

BUG: DataFrame.__setitem__ raising ValueError when setting multiple values to dup columns #39280

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jan 21, 2021

Conversation

phofl
Copy link
Member

@phofl phofl commented Jan 19, 2021

Setting with a scalar key which is duplicated in columns should be the same as using a listlike key

@phofl phofl added the Indexing Related to indexing on series/frames, not to indexes themselves label Jan 19, 2021
@phofl
Copy link
Member Author

phofl commented Jan 19, 2021

Hacky, will try to improve

@phofl
Copy link
Member Author

phofl commented Jan 19, 2021

Hm this is a hacky edge case:

df = pd.DataFrame([[1, 2, 3], [4, 5, 6]], columns=["a", "b", "a"])

df["a"] = df.index

This is tested in a unit test test_column_dups_operations

If this should work, we can't really implement

df["a"] = (1, 2)

because here is not clear what to do, interpret (1,2) as a column, which should result in (this is tested with the case above)

   a  b  a
0  1  2  1
1  2  5  2

or interpret as a row (as described in the op) which would result in

   a  b  a
0  1  2  2
1  1  5  2

The loc case interprets this as

df.loc[:,"a"] = df.index
   a  b  a
0  0  2  1
1  0  5  1

which results in an Error if len(index) != number of "a" in columns

@@ -3195,6 +3195,11 @@ def __setitem__(self, key, value):
self._setitem_array(key, value)
elif isinstance(value, DataFrame):
self._set_item_frame_value(key, value)
elif is_list_like(value) and 1 < len(
self.columns.get_indexer_for([key])
) == len(value):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not necessarily for this PR, but it might be worth implementing a Index.count method (mirroring list.count)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah this would be helpful probably, something like that would have helped in #39278

@jreback jreback added this to the 1.3 milestone Jan 21, 2021
@jreback jreback added the Bug label Jan 21, 2021
@jreback jreback merged commit c3012b1 into pandas-dev:master Jan 21, 2021
@jreback
Copy link
Contributor

jreback commented Jan 21, 2021

thanks @phofl

@phofl phofl deleted the 15695 branch January 21, 2021 16:04
@toobaz
Copy link
Member

toobaz commented Jan 21, 2021

Doesn't this slow down every setting operation with a single key? I think it's fine as a temporary measure, but really we shouldn't be doing multiple lookups for a single operation.

@toobaz
Copy link
Member

toobaz commented Jan 21, 2021

Doesn't this slow down every setting operation with a single key?

Well no, I guess only with a list of keys. Still, we could check if columns is unique (which is O(1), right?) before the test involving get_indexer_for?

@phofl
Copy link
Member Author

phofl commented Jan 21, 2021

That's probably a good idea, yes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Indexing Related to indexing on series/frames, not to indexes themselves
Projects
None yet
Development

Successfully merging this pull request may close these issues.

DataFrame.__setitem__ fails with multiple values and duplicate column names
4 participants