-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
ENH: Dataframe should have a .isin() method #4211
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
i like this idea but it should return a bool not the frame indexed by and-ing or or-ing the |
Absolutely on returning the bool. Mistake on my part. The most obvious uses are how='and' and how='or', which will return a 1-d bool that can be indexed into. Would it also be useful to have some way to get just the locations where the value is in the array? #pseduocode
df.isin(['a', 'b', 'c'], how='any') # any probably not the right argument name.
ids ids2 vals
0 True False False
1 True False False
2 False True False
3 False False False This can be achieved by applying |
If no one else minds, I can try to take a crack at this. I haven't done much with cython so you might want to label it someday. It looks like the Is there any preference for a default on Also for the examples I've been giving it should be |
u can prob just do this is python if its not ast enough could change (but also good to get algo right first) |
Yep. Should be much easier than I was thinking. I'll try to get this done today. Going to (possibly) be without internet until Sunday, but I should have it done by then. |
So right now I'm thinking about going with @jreback's idea and calling isin on each column. I'm getting one failure locally... (unreleated to what I've change, I think.)
I've got it pushed to my branch here. I'll do a proper PR in a sec, but I forgot to add the changes the release notes. Can I still make update the release notes and everything into the same commit without screwing up world? I know you aren't supposed to rebase changes pushed to a remote, but in this case is remote |
I get this but only when I use tox. Weird |
I think I'm also going to add an |
Design question for you all. If we have In [43]: df = DataFrame({'A': ['a', 'b', 'c', 'd'], 'B': [1, 2, 3, 4]})
In [44]: df
Out[44]:
A B
0 a 1
1 b 2
2 c 3
3 d 4 with something like FWIW, |
hm do you mean if it's a scalar then make it a list and it will be fine. OTOH it might be useful to make |
@hayd I see u waited a long time to merge after 0.12 :) |
@jreback I figured best to get in early on the 0.13 merge-storm... |
@hayd @TomAugspurger I think this should have an example in v0.13.0txt, u can put same example in isin docs section |
I'll give write up a quick example. Should I put a warning about issue #4421, where the value passed to |
shoudl that just raise if the index is not identical? (or maybe have a keyword 'index=False` or something to control it? |
can we add to What's New until at the same time 4421's fixed? |
@hayd are you saying just keep pushing it to the next release until someone, ok fine me :), gets around to fixing 4421 in a reasonable way? If so, what section should I put it under in |
I think we should fix 4421 before 0.13 (so it'll still be in the same What's New), not sure how long we have til release (?), but will have a look at it soon. |
Any reason not to give
DataFrame
a.isin()
method likeSeries
?The new wrinkle is that the user needs to specify if they want a logical OR or AND.
e.g.
See this SO post maybe.
If someone else wants to take this, feel free. Can't promise a PR any time soon, but maybe in the fall :)
The text was updated successfully, but these errors were encountered: