BUG: DataFrame.setitem raising ValueError when setting multiple values to dup columns #39280

phofl · 2021-01-19T21:21:25Z

closes DataFrame.__setitem__ fails with multiple values and duplicate column names #15695
tests added / passed
Ensure all linting tests pass, see here for how to run them
whatsnew entry

Setting with a scalar key which is duplicated in columns should be the same as using a listlike key

…alues to dup columns

phofl · 2021-01-19T22:46:54Z

Hacky, will try to improve

phofl · 2021-01-19T22:58:21Z

Hm this is a hacky edge case:

df = pd.DataFrame([[1, 2, 3], [4, 5, 6]], columns=["a", "b", "a"])

df["a"] = df.index

This is tested in a unit test test_column_dups_operations

If this should work, we can't really implement

df["a"] = (1, 2)

because here is not clear what to do, interpret (1,2) as a column, which should result in (this is tested with the case above)

   a  b  a
0  1  2  1
1  2  5  2

or interpret as a row (as described in the op) which would result in

   a  b  a
0  1  2  2
1  1  5  2

The loc case interprets this as

df.loc[:,"a"] = df.index
   a  b  a
0  0  2  1
1  0  5  1

which results in an Error if len(index) != number of "a" in columns

jbrockmendel · 2021-01-19T23:47:46Z

pandas/core/frame.py

@@ -3195,6 +3195,11 @@ def __setitem__(self, key, value):
            self._setitem_array(key, value)
        elif isinstance(value, DataFrame):
            self._set_item_frame_value(key, value)
+        elif is_list_like(value) and 1 < len(
+            self.columns.get_indexer_for([key])
+        ) == len(value):


not necessarily for this PR, but it might be worth implementing a Index.count method (mirroring list.count)?

Yeah this would be helpful probably, something like that would have helped in #39278

jreback · 2021-01-21T15:54:46Z

thanks @phofl

toobaz · 2021-01-21T16:05:00Z

Doesn't this slow down every setting operation with a single key? I think it's fine as a temporary measure, but really we shouldn't be doing multiple lookups for a single operation.

toobaz · 2021-01-21T16:09:48Z

Doesn't this slow down every setting operation with a single key?

Well no, I guess only with a list of keys. Still, we could check if columns is unique (which is O(1), right?) before the test involving get_indexer_for?

phofl · 2021-01-21T16:12:55Z

That's probably a good idea, yes

BUG: DataFrame.__setitem__ raising ValueError when setting multiple v…

4dfd328

…alues to dup columns

phofl added the Indexing Related to indexing on series/frames, not to indexes themselves label Jan 19, 2021

Fix bug in condition

09a59e0

jbrockmendel reviewed Jan 19, 2021

View reviewed changes

jreback added this to the 1.3 milestone Jan 21, 2021

jreback added the Bug label Jan 21, 2021

jreback approved these changes Jan 21, 2021

View reviewed changes

jreback merged commit c3012b1 into pandas-dev:master Jan 21, 2021

phofl deleted the 15695 branch January 21, 2021 16:04

jbrockmendel mentioned this pull request Jan 23, 2021

WIP/REF: BlockManager.setitem_blockwise #39302

Closed

debnathshoham mentioned this pull request Aug 28, 2021

PERF: indexing #43274

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: DataFrame.setitem raising ValueError when setting multiple values to dup columns #39280

BUG: DataFrame.setitem raising ValueError when setting multiple values to dup columns #39280

phofl commented Jan 19, 2021

phofl commented Jan 19, 2021

phofl commented Jan 19, 2021 •

edited

Loading

jbrockmendel Jan 19, 2021

phofl Jan 19, 2021

jreback commented Jan 21, 2021

toobaz commented Jan 21, 2021

toobaz commented Jan 21, 2021

phofl commented Jan 21, 2021

BUG: DataFrame.__setitem__ raising ValueError when setting multiple values to dup columns #39280

BUG: DataFrame.__setitem__ raising ValueError when setting multiple values to dup columns #39280

Conversation

phofl commented Jan 19, 2021

phofl commented Jan 19, 2021

phofl commented Jan 19, 2021 • edited Loading

jbrockmendel Jan 19, 2021

Choose a reason for hiding this comment

phofl Jan 19, 2021

Choose a reason for hiding this comment

jreback commented Jan 21, 2021

toobaz commented Jan 21, 2021

toobaz commented Jan 21, 2021

phofl commented Jan 21, 2021

BUG: DataFrame.setitem raising ValueError when setting multiple values to dup columns #39280

BUG: DataFrame.setitem raising ValueError when setting multiple values to dup columns #39280

phofl commented Jan 19, 2021 •

edited

Loading