BUG: loc.setitem raising ValueError when df has duplicate columns #39278

phofl · 2021-01-19T20:36:45Z

closes BUG: Setting values to slice fails with duplicated column name #38521
tests added / passed
Ensure all linting tests pass, see here for how to run them
whatsnew entry

phofl · 2021-01-19T22:03:19Z

Is there a more elegant way to check if an element occurs only once in an index?

jbrockmendel · 2021-01-19T23:58:03Z

pandas/tests/frame/indexing/test_setitem.py

@@ -373,6 +373,15 @@ def test_setitem_string_column_numpy_dtype_raising(self):
        expected = DataFrame([[1, 2, 5], [3, 4, 6]], columns=[0, 1, "0 - Name"])
        tm.assert_frame_equal(df, expected)

+    def test_setitem_empty_df_duplicate_columns(self):


do we need tests for Series?

Don't think so, indexer has to be a tuple to land there, which is only valid for dataframes I think?

its weird, but you could do series.loc[("foo",)]

Yep, definitely weird :)

But this is already converted before reaching that point.

df = Series(index=["a", "b", "b"], dtype="float64") df.loc[("a", )] = 1

This results in indexer=0. Tested the same with a MultiIndex Series, also 0 there

jbrockmendel · 2021-01-19T23:58:58Z

pandas/core/indexing.py

@@ -1850,7 +1850,8 @@ def _setitem_single_block(self, indexer, value, name: str):
                    for i, idx in enumerate(indexer)
                    if i != info_axis
                )
-                and item_labels.is_unique
+                and len(item_labels.get_indexer_for([item_labels[indexer[info_axis]]]))
+                == 1


lets separate this out into a nested condition and define item_labels[indexer[info_axis]] once, then reuse it on L1856

phofl · 2021-01-20T08:49:28Z

Failure unrelated

jreback · 2021-01-21T15:53:38Z

thanks @phofl

BUG: loc.setitem raising ValueError when df has duplicate columns

64101f8

phofl added the Indexing Related to indexing on series/frames, not to indexes themselves label Jan 19, 2021

phofl added 2 commits January 19, 2021 22:10

Use is_scalar instead of unique

6c9f245

Change condition

9bc5fc9

phofl mentioned this pull request Jan 19, 2021

BUG: DataFrame.__setitem__ raising ValueError when setting multiple values to dup columns #39280

Merged

4 tasks

jbrockmendel reviewed Jan 19, 2021

View reviewed changes

Make nested condition

74c1dd0

jreback added this to the 1.3 milestone Jan 21, 2021

jreback added the Bug label Jan 21, 2021

jreback merged commit 0270b23 into pandas-dev:master Jan 21, 2021

phofl deleted the 38521 branch January 21, 2021 17:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: loc.setitem raising ValueError when df has duplicate columns #39278

BUG: loc.setitem raising ValueError when df has duplicate columns #39278

phofl commented Jan 19, 2021

phofl commented Jan 19, 2021

jbrockmendel Jan 19, 2021

phofl Jan 20, 2021

jbrockmendel Jan 20, 2021

phofl Jan 20, 2021

jbrockmendel Jan 19, 2021

phofl Jan 20, 2021

phofl commented Jan 20, 2021

jreback commented Jan 21, 2021

BUG: loc.setitem raising ValueError when df has duplicate columns #39278

BUG: loc.setitem raising ValueError when df has duplicate columns #39278

Conversation

phofl commented Jan 19, 2021

phofl commented Jan 19, 2021

jbrockmendel Jan 19, 2021

Choose a reason for hiding this comment

phofl Jan 20, 2021

Choose a reason for hiding this comment

jbrockmendel Jan 20, 2021

Choose a reason for hiding this comment

phofl Jan 20, 2021

Choose a reason for hiding this comment

jbrockmendel Jan 19, 2021

Choose a reason for hiding this comment

phofl Jan 20, 2021

Choose a reason for hiding this comment

phofl commented Jan 20, 2021

jreback commented Jan 21, 2021