diff --git a/doc/source/indexing.rst b/doc/source/indexing.rst index bc9dcdccfc2e1..dc8c42fd11989 100644 --- a/doc/source/indexing.rst +++ b/doc/source/indexing.rst @@ -519,21 +519,14 @@ of the DataFrame): df[df['A'] > 0] -Consider the ``isin`` method of Series, which returns a boolean vector that is -true wherever the Series elements exist in the passed list. This allows you to -select rows where one or more columns have values you want: +List comprehensions and ``map`` method of Series can also be used to produce +more complex criteria: .. ipython:: python df2 = DataFrame({'a' : ['one', 'one', 'two', 'three', 'two', 'one', 'six'], 'b' : ['x', 'y', 'y', 'x', 'y', 'x', 'x'], 'c' : randn(7)}) - df2[df2['a'].isin(['one', 'two'])] - -List comprehensions and ``map`` method of Series can also be used to produce -more complex criteria: - -.. ipython:: python # only want 'two' or 'three' criterion = df2['a'].map(lambda x: x.startswith('t')) @@ -553,6 +546,26 @@ and :ref:`Advanced Indexing ` you may select along more than df2.loc[criterion & (df2['b'] == 'x'),'b':'c'] +.. _indexing.basics.indexing_isin: + +Indexing with isin +~~~~~~~~~~~~~~~~~~ + +Consider the ``isin`` method of Series, which returns a boolean vector that is +true wherever the Series elements exist in the passed list. This allows you to +select rows where one or more columns have values you want: + +.. ipython:: python + + s = Series(np.arange(5),index=np.arange(5)[::-1],dtype='int64') + + s + + s.isin([2, 4]) + + s[s.isin([2, 4])] + + DataFrame also has an ``isin`` method. When calling ``isin``, pass a set of values as either an array or dict. If values is an array, ``isin`` returns a DataFrame of booleans that is the same shape as the original DataFrame, with True @@ -585,6 +598,17 @@ You can also describe columns using integer location: df.isin(values, iloc=True) +Combine DataFrame's ``isin`` with the ``any()`` and ``all()`` methods to +quickly select subsets of your data that meet a given criteria. +To select a row where each column meets its own criterion: + +.. ipython:: python + + values = {'ids': ['a', 'b'], 'ids2': ['a', 'c'], 'vals': [1, 3]} + + row_mask = df.isin(values).all(1) + + df[row_mask] The :meth:`~pandas.DataFrame.where` Method and Masking ------------------------------------------------------ diff --git a/doc/source/v0.13.0.txt b/doc/source/v0.13.0.txt index 1f9dc8d7dad81..090be81a3ee7c 100644 --- a/doc/source/v0.13.0.txt +++ b/doc/source/v0.13.0.txt @@ -516,6 +516,22 @@ Experimental For more details see the :ref:`indexing documentation on query `. + - DataFrame now has an ``isin`` method that can be used to easily check whether the DataFrame's values are contained in an iterable. Use a dictionary if you'd like to check specific iterables for specific columns or rows. + + .. ipython:: python + + df = pd.DataFrame({'A': [1, 2, 3], 'B': ['d', 'e', 'f']}) + df.isin({'A': [1, 2], 'B': ['e', 'f']}) + + The ``isin`` method plays nicely with boolean indexing. To get the rows where each condition is met: + + .. ipython:: python + + mask = df.isin({'A': [1, 2], 'B': ['e', 'f']}) + df[mask.all(1)] + + See the :ref:`documentation` for more. + .. _whatsnew_0130.refactoring: Internal Refactoring