-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG/CLN: Cleanup sanitize column and abstract broadcasting #5341
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@@ -1417,6 +1417,18 @@ def test_getitem_boolean_missing(self): | |||
def test_setitem_boolean_missing(self): | |||
pass | |||
|
|||
def test_setitem_duplicate_columns_with_index(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would move this test with the other duplicate method tests (even though its a long method), test_columns_dups_operations
, or at least right after it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Absolutely - was searching for 'duplicate' so I missed it :)
@jreback A few points here:
|
regarding your last comment.... yeh..perf may not be a problem....I am always wary of heavily called functions that have to deal with unique/non-unique...almost always I separate these cases (because of perf) if its a multi-index it doesn't matter(so |
@jreback just thinking about it again, there's no reason to have a separate branch for Index, just needs to do an instance check for Index and ndarray first, then can use is_sequence. |
Is there a reason to prefer |
@jtratner that's fine...but then need to fix you can try |
@jreback wha? Timestamp is true for |
ah, I see, is_scalar is False for timestamp, got it. |
yep strings/Timestamp looks like lists and isscalr fails |
@jtratner if perf checks out ok...this looks good! |
I ran the perf check 4 times on base commit and tip of this PR, everything came out about even on average, and the few that were particularly different were because 1 run of that particular benchmark took much more time than the other three, so I think it should be fine. |
And only try to broadcast if columns aren't unique or are MI. (also in the process fix small bug from pandas-dev#5321)
ok good to go then |
BUG/CLN: Cleanup sanitize column and abstract broadcasting
Also fixes small bug from #5321