Skip to content

ENH/API: check for full rows of DataFrame with isin? #7258

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
jorisvandenbossche opened this issue May 28, 2014 · 9 comments
Open

ENH/API: check for full rows of DataFrame with isin? #7258

jorisvandenbossche opened this issue May 28, 2014 · 9 comments
Labels
Enhancement isin isin method Reshaping Concat, Merge/Join, Stack/Unstack, Explode

Comments

@jorisvandenbossche
Copy link
Member

Seeing this SO question: http://stackoverflow.com/questions/23896088/dropping-rows-in-dataframe-if-found-in-another-one/, I was wondering if this is something that could be provided as functionality to the .isin() method.

The problem at the moment to use isin to check for the occurence of a full row in a DataFrame is that a) isin checks for the values in each column seperately, indepenently of the values in other columns (so you cannot check if the values occur together in the same row) and b) isin also checks if the index label matches.

Or are there better ways to check for the occurence of a full row in a DataFrame?

@jreback
Copy link
Contributor

jreback commented May 28, 2014

wasn't their discussion of a ignore_index=False kw for isin? @TomAugspurger

@jorisvandenbossche
Copy link
Member Author

I think that would be a usefull addition. And it solves my point b), but it is not enough to do this row checking, since the columns are still handled seperately.

With an example:

In [250]: df = DataFrame({'A': [1, 2, 3, 4], 'B': ['a', 'b', 'c', 'd']})
In [251]: df
Out[251]: 
   A  B
0  1  a
1  2  b
2  3  c
3  4  d

In [252]: other = pd.DataFrame({'A': [1, 2, 3], 'B': ['a', 'c', 'd']})
In [253]: other
Out[253]: 
   A  B
0  1  a
1  2  c
2  3  d

isin now also checks for the index label, so this gives:

In [254]: df.isin(other)
Out[254]: 
       A      B
0   True   True
1   True  False
2   True  False
3  False  False
4  False  False

With something like ignore_index=True, this could be achieved at this moment with a to_dict, and gives something like this:

In [258]: df.isin(other.to_dict('list'))   # or df.isin(other, ignore_index=True)
Out[258]: 
       A      B
0   True   True
1   True  False
2   True   True
3  False   True


In [259]: df.isin(other.to_dict('list')).all(1)
Out[259]: 
0     True
1    False
2     True
3    False
dtype: bool

However, if you want to check for entire rows, the result should be something like this:

In [259]: df.isin(other, check_entire_row=True, ignore_index=True)
Out[259]: 
0     True
1    False
2    False
3    False
dtype: bool

Problem with this that this also changes the output shape

@hayd hayd added this to the 0.15.0 milestone May 29, 2014
@TomAugspurger
Copy link
Contributor

There was a discussion about an ignore_index kwarg. I'll look back at the PR, but I think I just left it as a todo, nothing against it in principle.

This seems useful, I could look into doing a PR next week maybe.

@hayd
Copy link
Contributor

hayd commented May 29, 2014

Yes, this was a todo, but I think we were trying for only one kwarg (and struggling). I like your argument that there should be two.

@jreback jreback modified the milestones: 0.16.0, Next Major Release Mar 6, 2015
@pemontto
Copy link

This would be very useful to have, the workaround isn't entirely obvious.

@JurijsNazarovs
Copy link

Any updates?

@jorisvandenbossche jorisvandenbossche added Enhancement and removed Ideas Long-Term Enhancement Discussions labels Mar 5, 2018
@jorisvandenbossche
Copy link
Member Author

@JurijsNazarovs This is an open issue, so code contributions to make this actually happen are very welcome.

@fingertap
Copy link

fingertap commented Aug 9, 2018

My work around (Python 3):

import pandas as pd
from functools import reduce

a = pd.DataFrame([[1, 2], [1, 2], [3,4], [3, 4]])
b = a.sample(1)

def isin_row(a, b, cols=None):
    cols = cols or a.columns
    return reduce(lambda x, y:x&y, [a[f].isin(b[f]) for f in cols])

print(isin_row(a, b))

The result is something like this:

0    False
1    False
2     True
3     True
dtype: bool

which can be used to select rows in the original dataframe.

@jbrockmendel jbrockmendel added the isin isin method label Oct 30, 2020
@RyuuOujiXS
Copy link

DataFrame.isin returns a DataFrame with all values replaced with True or False. DataFrame.All and DataFrame.Any will return True or False for EACH INDEX/LABEL with either all or some True values. Use both to return a series of True/False for EACH INDEX/LABEL and do with that what you will.

@mroeschke mroeschke removed this from the Contributions Welcome milestone Oct 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement isin isin method Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

No branches or pull requests

10 participants