Clean up DataFrame.setitem behavior for duplicate columns #39403

phofl · 2021-01-25T20:30:25Z

tests added / passed
Ensure all linting tests pass, see here for how to run them

cc @jbrockmendel This cleans the edgy behavior up for duplicate columns.

Previous test was wrong.

pandas/core/frame.py

pandas/tests/frame/indexing/test_setitem.py

jbrockmendel · 2021-01-27T00:57:10Z

pandas/core/indexers.py

@@ -376,6 +376,25 @@ def unpack_1tuple(tup):
    return tup


+def check_key_length(columns, key, value):
+    """Checks if a key used as indexer has the same length as the columns it is


newline after """

jbrockmendel · 2021-01-27T00:57:39Z

pandas/core/indexers.py

+
+    Parameters
+    ----------
+    columns: The columns of the DataFrame to index.


columns : Index

pandas/core/indexers.py

jbrockmendel · 2021-01-27T04:20:15Z

If you're interested, there might be something worth salvaging from _setitem_array here. Avoids going through iloc so that we dont have to worry about inplace vs not-inplace there (xref #38896).

If/when I get #39302 in shape to be a real PR, something like that branch will have to be a part of it, but I'm optimistic you'll get around to it before I do.

phofl · 2021-01-27T20:23:33Z

That looks good. I will try to incorporate this after this is merged.

pandas/core/indexers.py

jreback · 2021-01-28T01:04:33Z

pandas/core/indexers.py

@@ -376,6 +376,27 @@ def unpack_1tuple(tup):
    return tup


+def check_key_length(columns, key, value):


can you type as much as you can here

We would have to type wiht Index and DataFrame causing circular imports. Is there a way around this?

do the import inside a if TYPE_CHECKING:

Sorry, should have said that I have already tried this. If using TYPE_CHECKING and typing the signature (not the return value) this raises
NameError: name 'DataFrame' is not defined

Did not know about the future import. Thx for the help

jbrockmendel · 2021-01-28T16:36:08Z

pandas/core/indexers.py

+
+    Parameters
+    ----------
+    columns: Index The columns of the DataFrame to index.


space between "columns" and colon, same on the next two lines

jreback · 2021-01-29T02:13:14Z

thanks @phofl

was this technically a bug fix? or just an incorrect test?

phofl · 2021-01-29T21:03:00Z

Bug fix yes, but is covered by the whatsnew from #39341, follow up from that one

phofl added 2 commits January 25, 2021 21:29

Clean up DataFrame.setitem behavior for duplicate columns

9ce8713

Add gh reference

d9c8026

jbrockmendel reviewed Jan 25, 2021

View reviewed changes

pandas/core/frame.py Outdated Show resolved Hide resolved

jbrockmendel reviewed Jan 25, 2021

View reviewed changes

pandas/tests/frame/indexing/test_setitem.py Show resolved Hide resolved

jreback added Indexing Related to indexing on series/frames, not to indexes themselves Refactor Internal refactoring of code labels Jan 26, 2021

phofl added 2 commits January 26, 2021 21:16

Move function

4d57974

Remove type

d0bdc78

jbrockmendel reviewed Jan 27, 2021

View reviewed changes

pandas/core/indexers.py Show resolved Hide resolved

phofl added 2 commits January 27, 2021 19:43

Add note

6f5f419

Adjust test

e0c047e

jreback requested changes Jan 28, 2021

View reviewed changes

jreback added this to the 1.3 milestone Jan 28, 2021

jbrockmendel reviewed Jan 28, 2021

View reviewed changes

phofl added 3 commits January 28, 2021 19:57

Add spaces

9a82e6c

Add raises section

2dad847

Add types

ee4a849

jreback approved these changes Jan 29, 2021

View reviewed changes

jreback merged commit 7b795b2 into pandas-dev:master Jan 29, 2021

phofl deleted the cln_setitem_dup_columns branch January 29, 2021 21:03

jreback mentioned this pull request Feb 2, 2021

Warn on boolean frame indexer #39373

Closed

4 tasks

jorisvandenbossche mentioned this pull request May 9, 2022

API: DataFrame setitem: setting columns with a DataFrame RHS doesn't align column names? #46974

Open

		@@ -376,6 +376,27 @@ def unpack_1tuple(tup):
		return tup


		def check_key_length(columns, key, value):

Uh oh!

Clean up DataFrame.setitem behavior for duplicate columns #39403

Clean up DataFrame.setitem behavior for duplicate columns #39403

Uh oh!

Conversation

phofl commented Jan 25, 2021

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jbrockmendel commented Jan 27, 2021

Uh oh!

phofl commented Jan 27, 2021

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jreback commented Jan 29, 2021

Uh oh!

phofl commented Jan 29, 2021

Uh oh!

Uh oh!