Skip to content

Conditional Join (merge) in pandas #7480

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
ywhcuhk opened this issue Jun 17, 2014 · 2 comments
Closed

Conditional Join (merge) in pandas #7480

ywhcuhk opened this issue Jun 17, 2014 · 2 comments
Labels
Enhancement Reshaping Concat, Merge/Join, Stack/Unstack, Explode

Comments

@ywhcuhk
Copy link

ywhcuhk commented Jun 17, 2014

I wonder if it possible to implement conditional join (merge) between pandas dataframes. Basically, I am thinking some conditional SQL-like joins:

  select a.id, a.date, a.var1, a.var2, 
      b.var3 
      from data1 as a  left join data2 as b
      on (a.id<b.key+2 and a.id>b.key-3) and (a.date>b.date-10 and a.date<b.date+10);

Above code is just a made up example to show the purpose.


http://stackoverflow.com/questions/23508351/how-to-do-a-conditional-join-in-python-pandas

This question on stackoverflow is ask for the same purpose. The answer provided in the question solves the problem, but it's quite ad hoc.

Can we have something like:

conditions= " (a.id<b.key+2 and a.id>b.key-3) and (a.date>b.date-10 and a.date<b.date+10)"
df=pd.merge(left, right, on= "conditions", how="left (or right...) ")

It seems until now, the SQL statements are still way more flexible than pandas "merge" or "join". It would be nice to have it in pandas itself.

@jreback jreback modified the milestones: 0.15.0, Someday Jun 17, 2014
@jreback
Copy link
Contributor

jreback commented Jun 17, 2014

the linked SO soln seems quite reasonable to me. I think adding a where to merge could be done, but would make the syntax even more cumbersome and add even more options. This is makes this it more non-pythonic / pandonic IMHO.

Python/pandas is all about doing relatively simple, chained operations, in a pipeline. Easier for the user to grok what is going on and examine. A where followed by a merge, is IMHO, by a black box (like SQL). Yes power is important, but I think most would prefer to have a statement that can be expressed in a short statement.

That said I marked it as an issue. If you'd like to attempt an impl. Pls do so.

@mroeschke
Copy link
Member

I think #8962 is the generalize version of this request. Will link this issue over on that issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

No branches or pull requests

3 participants