Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
ENH: anti joins #49328
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: anti joins #49328
Changes from 45 commits
fba0bb5
53fd41d
627c31e
448373b
4dd802d
3b17c59
6e3d1a4
6427f09
86ddac9
84294e4
43ae0a1
c36705c
951406a
db80abf
d93c0ac
90af576
80ce02e
79bbbb9
ee7cc16
76cd5c6
d358efc
14d0d4c
3fe64f4
fc50027
aba9a30
411bcaa
417ea13
09426c6
b6e72aa
8338a48
cc6c8ea
f33fe48
74e172b
594f80a
f395cbb
bf76fda
32403f8
8b9a8e5
a639f3d
93150b7
f63c903
9a00186
e021a0c
50ffb40
90639aa
9393ed6
378c858
645b02d
139881d
e2a9423
27f87ab
02158be
f79e600
d64d53a
1d50ba3
858fae5
ebad0d9
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it common to still pull the columns from the right table in an anti join? These columns will just always be NA right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @WillAyd for the review.
That is true, but I am not sure. I had taken the example from a comment in the bug report.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Linking back to the other PR you had there is concern about side effects of running this code - can you do this without assignment back to self?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the previous PR, I had changed
_MergeOperation
so that anti's are converted into left/right/outer and relevant changes are made in the initial dataframes.But Jeff had suggested #43056 (review) that we should keep the validation and the merge operation separate (as it is currently for the other joins).
The final result is same as before. I think I tweaked a couple of test results, but I guess those were bugs that were present earlier.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I missed, so the side effect should not be in the validation.. this is in the operation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curious if you've tried something closer to a left join with a filter on subsequent NA values compared to this. At first glance this seems like it could be a pretty expensive way of calculating this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
honestly I haven't tried that, I just went with #42916 (comment) from @attack68
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For reference, below is the asv comparison
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC this is the same operation as np.setdiff1d . Generally can you look at using the numpy functions here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are you suggesting this
numpy
thing specifically here in_get_join_info
? I will have to try how that works out with something like EA / Categorical dtypesThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks a bit more complicated than what you had in the other PR - was that not working before?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you address this comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey, sorry. I thought the comment above covered this.
Yes, the previous PR was working. But the validation and the actual operation was not separate.