-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
ENH/API: check for full rows of DataFrame with isin? #7258
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
wasn't their discussion of a |
I think that would be a usefull addition. And it solves my point b), but it is not enough to do this row checking, since the columns are still handled seperately. With an example:
With something like
However, if you want to check for entire rows, the result should be something like this:
Problem with this that this also changes the output shape |
There was a discussion about an This seems useful, I could look into doing a PR next week maybe. |
Yes, this was a todo, but I think we were trying for only one kwarg (and struggling). I like your argument that there should be two. |
This would be very useful to have, the workaround isn't entirely obvious. |
Any updates? |
@JurijsNazarovs This is an open issue, so code contributions to make this actually happen are very welcome. |
My work around (Python 3): import pandas as pd
from functools import reduce
a = pd.DataFrame([[1, 2], [1, 2], [3,4], [3, 4]])
b = a.sample(1)
def isin_row(a, b, cols=None):
cols = cols or a.columns
return reduce(lambda x, y:x&y, [a[f].isin(b[f]) for f in cols])
print(isin_row(a, b)) The result is something like this:
which can be used to select rows in the original dataframe. |
DataFrame.isin returns a DataFrame with all values replaced with True or False. DataFrame.All and DataFrame.Any will return True or False for EACH INDEX/LABEL with either all or some True values. Use both to return a series of True/False for EACH INDEX/LABEL and do with that what you will. |
Seeing this SO question: http://stackoverflow.com/questions/23896088/dropping-rows-in-dataframe-if-found-in-another-one/, I was wondering if this is something that could be provided as functionality to the
.isin()
method.The problem at the moment to use
isin
to check for the occurence of a full row in a DataFrame is that a)isin
checks for the values in each column seperately, indepenently of the values in other columns (so you cannot check if the values occur together in the same row) and b)isin
also checks if the index label matches.Or are there better ways to check for the occurence of a full row in a DataFrame?
The text was updated successfully, but these errors were encountered: