Skip to content

Clean up DataFrame.setitem behavior for duplicate columns #39403

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Jan 29, 2021

Conversation

phofl
Copy link
Member

@phofl phofl commented Jan 25, 2021

  • tests added / passed
  • Ensure all linting tests pass, see here for how to run them

cc @jbrockmendel This cleans the edgy behavior up for duplicate columns.

Previous test was wrong.

@jreback jreback added Indexing Related to indexing on series/frames, not to indexes themselves Refactor Internal refactoring of code labels Jan 26, 2021
@@ -376,6 +376,25 @@ def unpack_1tuple(tup):
return tup


def check_key_length(columns, key, value):
"""Checks if a key used as indexer has the same length as the columns it is
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

newline after """

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thx


Parameters
----------
columns: The columns of the DataFrame to index.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

columns : Index

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed

@jbrockmendel
Copy link
Member

If you're interested, there might be something worth salvaging from _setitem_array here. Avoids going through iloc so that we dont have to worry about inplace vs not-inplace there (xref #38896).

If/when I get #39302 in shape to be a real PR, something like that branch will have to be a part of it, but I'm optimistic you'll get around to it before I do.

@phofl
Copy link
Member Author

phofl commented Jan 27, 2021

That looks good. I will try to incorporate this after this is merged.

@@ -376,6 +376,27 @@ def unpack_1tuple(tup):
return tup


def check_key_length(columns, key, value):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you type as much as you can here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We would have to type wiht Index and DataFrame causing circular imports. Is there a way around this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do the import inside a if TYPE_CHECKING:

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, should have said that I have already tried this. If using TYPE_CHECKING and typing the signature (not the return value) this raises
NameError: name 'DataFrame' is not defined

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did not know about the future import. Thx for the help

@jreback jreback added this to the 1.3 milestone Jan 28, 2021

Parameters
----------
columns: Index The columns of the DataFrame to index.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

space between "columns" and colon, same on the next two lines

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thx

@jreback jreback merged commit 7b795b2 into pandas-dev:master Jan 29, 2021
@jreback
Copy link
Contributor

jreback commented Jan 29, 2021

thanks @phofl

was this technically a bug fix? or just an incorrect test?

@phofl
Copy link
Member Author

phofl commented Jan 29, 2021

Bug fix yes, but is covered by the whatsnew from #39341, follow up from that one

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Indexing Related to indexing on series/frames, not to indexes themselves Refactor Internal refactoring of code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants