-
Notifications
You must be signed in to change notification settings - Fork 21
Remove cross-dataframe comparisons #242
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove cross-dataframe comparisons #242
Conversation
I did a quick search only for I guess the main questions I have are:
|
Thanks for taking a look The pandas example is just in the release notes Do we have any examples of downstream libraries using this in their core code?
They do a join beforehand. The reason to not do a join under-the-hood for them is that like this, they only do the join once, and then can do all the comparisons maths they want
If the columns come from the same dataframe, then no, it'll all work fine Try expressing, using just sql syntax, the sum between select col1 + col2 as 'col1_col2_sum'
from df If they're from different tables, then a join is required |
I'm generally a +1 removing
I think we need to work through how we handle Columns more generally, because our current design doesn't have this kind of information. |
Agree that we need to work through how we handle columns more generally, and that needs doing independently of this PR We OK to start with this one? |
One thing I came to realize is that there are two types of dataframe usage:
Element-wise operations like @MarcoGorelli and I discussed this a bit yesterday, and concluded that this would be okay to drop as a mode of operation (and hence, dropping Probably this PR can be merged after tomorrow's meeting. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This got more thumbs-up's in the meeting yesterday, and it seems like in general we'd like to revisit/remove the APIs that have now been identified as being problematic for lazy implementations. So in it goes. Thanks @MarcoGorelli and @kkraus14!
Just out of interest, has any here ever needed to sum two entire dataframes? I haven't, and don't think there's much use for these comparison ops when
other
is a dataframeIt also avoids some of the discussion in #224 - if it can't be done, there's no need to document that in some cases it might not be allowed