Skip to content

ENH: Dataframe isin2 #4258

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Jul 24, 2013
Merged

ENH: Dataframe isin2 #4258

merged 3 commits into from
Jul 24, 2013

Conversation

hayd
Copy link
Contributor

@hayd hayd commented Jul 16, 2013

fixes #4211, an alternative (on top of) #4237

DataFrame isin method:

In [11]: df = pd.DataFrame([['a', 'a', 'c'], ['b', 'e', 'a'], ['c', 'a', 'f']], columns=['A', 'A', 'B'])

In [12]: df
Out[12]:
   A  A  B
0  a  a  c
1  b  e  a
2  c  a  f

In [13]: df.isin(['a'])
Out[13]:
       A      A      B
0   True   True  False
1  False  False   True
2  False   True  False

In [14]: df.isin({'A': ['a']})
Out[14]:
       A      A      B
0   True   True  False
1  False  False  False
2  False   True  False

In [15]: df.isin({0: ['a']}, iloc=True)
Out[15]:
       A      A      B
0   True  False  False
1  False  False  False
2  False  False  False

cc @TomAugspurger

TomAugspurger and others added 2 commits July 15, 2013 21:08
docs. to be rebased

ENH: Add isin method to DataFrame

Basic tests.

Added method and fixed tests.

ENH: Add ordered argument to df.isin()

Expects a sequence of arrays.

Updated release notes for df.isin()

CLN: cleanup

going to remove ordered argument.

Using a dict for ordered matching. Docs

BUG: fixed subselection length check issues.

Updated release notes for df.isin()

remove merge conflict note
@TomAugspurger
Copy link
Contributor

Looks good! The for i, ind in enumerate(self.columns) is what keeps the columns in order when using a dict right?

@hayd
Copy link
Contributor Author

hayd commented Jul 16, 2013

Yes, we grab out the columns (as a DataFrame!) by integer location and when they're concat-ed back they stay in order :) , slightly changed notation in second commit (to be less ambiguous / ix like):

concat((self.iloc[:, [i]].isin(values[i]) for i, col in enumerate(self.columns)), axis=1)  # iloc
concat((self.iloc[:, [i]].isin(values[col]) for i, col in enumerate(self.columns)), axis=1) # label

@jreback
Copy link
Contributor

jreback commented Jul 16, 2013

yep... @hayd method is also correct

a selector like .loc/[] will get you more than one column (in the order you specify) if there are dups (which is what you generally want), but in this case you are just doing a transform so need to address by .iloc

@TomAugspurger
Copy link
Contributor

Cool. Thanks for all the hand-holding through this.

@hayd
Copy link
Contributor Author

hayd commented Jul 16, 2013

@jreback 0.13 ?

@jreback
Copy link
Contributor

jreback commented Jul 16, 2013

if u want to shove in 0.12 ok
trying to stop adding but seems like a nice feature
@cpcloud ?


.. ipython:: python

values = values = {0: ['a', 'b']}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the double assignment isn't necessary here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whoops!

@cpcloud
Copy link
Member

cpcloud commented Jul 16, 2013

i really like this, but i think we should maybe just focus on getting the rest of the bugs fixed for 0.12 and add this very soon after 0.12 is released.

@cpcloud
Copy link
Member

cpcloud commented Jul 16, 2013

OTOH i haven't had a chance to play with this so if @jreback and @hayd think it's ok and won't delay the release at all, it's ok by me

@jreback
Copy link
Contributor

jreback commented Jul 16, 2013

here's the issue

once you release it the API shouldn't change
if its good then that is fine

you could mark it experimental if u want

@hayd
Copy link
Contributor Author

hayd commented Jul 16, 2013

I think the api is pretty good (and will be stable)... but I'm biased!

@jreback jreback mentioned this pull request Jul 20, 2013
hayd added a commit that referenced this pull request Jul 24, 2013
@hayd hayd merged commit 0df80fb into pandas-dev:master Jul 24, 2013
@hayd hayd deleted the dataframe_isin2 branch July 24, 2013 21:45
@hayd
Copy link
Contributor Author

hayd commented Jul 24, 2013

merged into 0.13 :)

but oops should have moved the release note...

@hayd
Copy link
Contributor Author

hayd commented Jul 24, 2013

(pushed release note fix for that direct to master.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ENH: Dataframe should have a .isin() method
4 participants