ENH:Create Set Operations #42177

tlplayer · 2021-06-21T19:36:59Z

Is your feature request related to a problem?

I wish to take the AND, OR, XOR, and NOT of dataframes. Of course I could do this manually but an inbuilt way would be far cleaner and elegant.

Possible Solution

Example 1:

A = {'a': 1 2 3}
B = {'a': 1 3 4}

pandas.xor(A,B, on = "a")

{'a': 2 4}

Example 2:

A.or( B, on= "a")

{'a': 1 2 3 4}

Example 3:

A.and(B,on='a')

{'a': 1 3}

API breaking implications

It will not affect the API it is just a convenience feature.

Additional context

Typically when comparing data from 2 sources, fields will not correlate and need to be cleaned through basic and, or, nor, and not, and xor operations. This would speed up greatly those tasks.

Code to come later wanting to hear you're thoughts on implementation first.

AlexKirko · 2021-06-22T05:44:30Z

I believe this is mostly a duplicate of #4480, which is the same thing but for Series, and there was plenty of discussion and attempts to contribute something that would be better than just accessing Series.values and then using numpy-level set operations. Considering that there was nothing implemented for that issue in the end, I do not believe we'll get anywhere here, since performing set operations on a DataFrame is more arbitrary than on a Series (multi-level indices and such come to mind).

I'll leave this issue open for now, since it's not technically a duplicate and I may be missing something.

parkdj1 · 2021-06-24T15:07:12Z

You can do most of these with a single line already (perhaps not as intuitive but possible)!

Using your example of
A = {'a': 1 2 3}
B = {'a': 1 3 4}

The 'OR' function to get {'a': 1 2 3 4} can be accomplished with
A.merge(B,how='outer',on='a')

The 'AND' function to get {'a': 1 3} can be accomplished with
A.merge(B,how='inner',on='a')

The 'XOR' function to get {'a': 2 4} would require a bit more manipulation, but still simple enough with a different approach imho
A.append(B).drop_duplicates(keep=False)

Not sure what taking the NOT of a dataframe would mean, but perhaps the .ne function would help? Unless you're talking about doing something like A.OR(NOT B).

tlplayer · 2021-06-25T16:26:11Z

I was thinking more A.not(B) would be like A[~A.isin(B)]. I just think it would be more intuitive that way.

parkdj1 · 2021-06-28T16:50:55Z

Hmm yeah I see what you mean. I would probably just do A[~A['a'].isin(B['a'])] which is basically what you already said.

+1 for it would be nice to have some functionality in subsetting dataframes that is more intuitive and doesn't require use of index/col names

AlexKirko · 2021-07-01T15:25:31Z

I don't think that the pain of using less readable syntax is worth the legwork this will take to implement, but that's for the contributor to decide.

Delengowski · 2021-07-31T04:24:51Z

You can do most of these with a single line already (perhaps not as intuitive but possible)!

Using your example of
A = {'a': 1 2 3}
B = {'a': 1 3 4}

The 'OR' function to get {'a': 1 2 3 4} can be accomplished with
A.merge(B,how='outer',on='a')

The 'AND' function to get {'a': 1 3} can be accomplished with
A.merge(B,how='inner',on='a')

The 'XOR' function to get {'a': 2 4} would require a bit more manipulation, but still simple enough with a different approach imho
A.append(B).drop_duplicates(keep=False)

Not sure what taking the NOT of a dataframe would mean, but perhaps the .ne function would help? Unless you're talking about doing something like A.OR(NOT B).

'NOT' wouldn't work for the same reason it doesn't work on set. NOT in this case would be the absolute complement except we have no universal set U to take the actual set difference from.

Delengowski · 2021-07-31T04:27:20Z

I was thinking more A.not(B) would be like A[~A.isin(B)]. I just think it would be more intuitive that way.

This is set difference. The elements of A that are not shared with B i.e. relative complement.

tlplayer added Enhancement Needs Triage Issue that has not been reviewed by a pandas team member labels Jun 21, 2021

jbrockmendel added the setops union, intersection, difference, symmetric_difference label Jun 21, 2021

jbrockmendel removed the Needs Triage Issue that has not been reviewed by a pandas team member label Jun 24, 2021

mroeschke added the Needs Discussion Requires discussion from core team before further action label Aug 21, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH:Create Set Operations #42177

ENH:Create Set Operations #42177

tlplayer commented Jun 21, 2021

AlexKirko commented Jun 22, 2021 •

edited

Loading

parkdj1 commented Jun 24, 2021 •

edited

Loading

tlplayer commented Jun 25, 2021

parkdj1 commented Jun 28, 2021

AlexKirko commented Jul 1, 2021

Delengowski commented Jul 31, 2021 •

edited

Loading

Delengowski commented Jul 31, 2021 •

edited

Loading

ENH:Create Set Operations #42177

ENH:Create Set Operations #42177

Comments

tlplayer commented Jun 21, 2021

Is your feature request related to a problem?

Possible Solution

API breaking implications

Additional context

AlexKirko commented Jun 22, 2021 • edited Loading

parkdj1 commented Jun 24, 2021 • edited Loading

tlplayer commented Jun 25, 2021

parkdj1 commented Jun 28, 2021

AlexKirko commented Jul 1, 2021

Delengowski commented Jul 31, 2021 • edited Loading

Delengowski commented Jul 31, 2021 • edited Loading

AlexKirko commented Jun 22, 2021 •

edited

Loading

parkdj1 commented Jun 24, 2021 •

edited

Loading

Delengowski commented Jul 31, 2021 •

edited

Loading

Delengowski commented Jul 31, 2021 •

edited

Loading