Skip to content

ENH: Allow where/mask/Indexers to accept callable #12539

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
93 changes: 79 additions & 14 deletions doc/source/indexing.rst
Original file line number Diff line number Diff line change
Expand Up @@ -79,6 +79,10 @@ of multi-axis indexing.
- A slice object with labels ``'a':'f'``, (note that contrary to usual python
slices, **both** the start and the stop are included!)
- A boolean array
- A ``callable`` function with one argument (the calling Series, DataFrame or Panel) and
that returns valid output for indexing (one of the above)

.. versionadded:: 0.18.1

See more at :ref:`Selection by Label <indexing.label>`

Expand All @@ -93,6 +97,10 @@ of multi-axis indexing.
- A list or array of integers ``[4, 3, 0]``
- A slice object with ints ``1:7``
- A boolean array
- A ``callable`` function with one argument (the calling Series, DataFrame or Panel) and
that returns valid output for indexing (one of the above)

.. versionadded:: 0.18.1

See more at :ref:`Selection by Position <indexing.integer>`

Expand All @@ -110,6 +118,8 @@ of multi-axis indexing.
See more at :ref:`Advanced Indexing <advanced>` and :ref:`Advanced
Hierarchical <advanced.advanced_hierarchical>`.

- ``.loc``, ``.iloc``, ``.ix`` and also ``[]`` indexing can accept a ``callable`` as indexer. See more at :ref:`Selection By Callable <indexing.callable>`.

Getting values from an object with multi-axes selection uses the following
notation (using ``.loc`` as an example, but applies to ``.iloc`` and ``.ix`` as
well). Any of the axes accessors may be the null slice ``:``. Axes left out of
Expand Down Expand Up @@ -317,6 +327,7 @@ The ``.loc`` attribute is the primary access method. The following are valid inp
- A list or array of labels ``['a', 'b', 'c']``
- A slice object with labels ``'a':'f'`` (note that contrary to usual python slices, **both** the start and the stop are included!)
- A boolean array
- A ``callable``, see :ref:`Selection By Callable <indexing.callable>`

.. ipython:: python

Expand All @@ -340,13 +351,13 @@ With a DataFrame
index=list('abcdef'),
columns=list('ABCD'))
df1
df1.loc[['a','b','d'],:]
df1.loc[['a', 'b', 'd'], :]

Accessing via label slices

.. ipython:: python

df1.loc['d':,'A':'C']
df1.loc['d':, 'A':'C']

For getting a cross section using a label (equiv to ``df.xs('a')``)

Expand All @@ -358,15 +369,15 @@ For getting values with a boolean array

.. ipython:: python

df1.loc['a']>0
df1.loc[:,df1.loc['a']>0]
df1.loc['a'] > 0
df1.loc[:, df1.loc['a'] > 0]

For getting a value explicitly (equiv to deprecated ``df.get_value('a','A')``)

.. ipython:: python

# this is also equivalent to ``df1.at['a','A']``
df1.loc['a','A']
df1.loc['a', 'A']

.. _indexing.integer:

Expand All @@ -387,6 +398,7 @@ The ``.iloc`` attribute is the primary access method. The following are valid in
- A list or array of integers ``[4, 3, 0]``
- A slice object with ints ``1:7``
- A boolean array
- A ``callable``, see :ref:`Selection By Callable <indexing.callable>`

.. ipython:: python

Expand Down Expand Up @@ -416,26 +428,26 @@ Select via integer slicing
.. ipython:: python

df1.iloc[:3]
df1.iloc[1:5,2:4]
df1.iloc[1:5, 2:4]

Select via integer list

.. ipython:: python

df1.iloc[[1,3,5],[1,3]]
df1.iloc[[1, 3, 5], [1, 3]]

.. ipython:: python

df1.iloc[1:3,:]
df1.iloc[1:3, :]

.. ipython:: python

df1.iloc[:,1:3]
df1.iloc[:, 1:3]

.. ipython:: python

# this is also equivalent to ``df1.iat[1,1]``
df1.iloc[1,1]
df1.iloc[1, 1]

For getting a cross section using an integer position (equiv to ``df.xs(1)``)

Expand Down Expand Up @@ -471,8 +483,8 @@ returned)

dfl = pd.DataFrame(np.random.randn(5,2), columns=list('AB'))
dfl
dfl.iloc[:,2:3]
dfl.iloc[:,1:3]
dfl.iloc[:, 2:3]
dfl.iloc[:, 1:3]
dfl.iloc[4:6]

A single indexer that is out of bounds will raise an ``IndexError``.
Expand All @@ -481,12 +493,52 @@ A list of indexers where any element is out of bounds will raise an

.. code-block:: python

dfl.iloc[[4,5,6]]
dfl.iloc[[4, 5, 6]]
IndexError: positional indexers are out-of-bounds

dfl.iloc[:,4]
dfl.iloc[:, 4]
IndexError: single positional indexer is out-of-bounds

.. _indexing.callable:

Selection By Callable
---------------------

.. versionadded:: 0.18.1

``.loc``, ``.iloc``, ``.ix`` and also ``[]`` indexing can accept a ``callable`` as indexer.
The ``callable`` must be a function with one argument (the calling Series, DataFrame or Panel) and that returns valid output for indexing.

.. ipython:: python

df1 = pd.DataFrame(np.random.randn(6, 4),
index=list('abcdef'),
columns=list('ABCD'))
df1

df1.loc[lambda df: df.A > 0, :]
df1.loc[:, lambda df: ['A', 'B']]

df1.iloc[:, lambda df: [0, 1]]

df1[lambda df: df.columns[0]]


You can use callable indexing in ``Series``.

.. ipython:: python

df1.A.loc[lambda s: s > 0]

Using these methods / indexers, you can chain data selection operations
without using temporary variable.

.. ipython:: python

bb = pd.read_csv('data/baseball.csv', index_col='id')
(bb.groupby(['year', 'team']).sum()
.loc[lambda df: df.r > 100])

.. _indexing.basics.partial_setting:

Selecting Random Samples
Expand Down Expand Up @@ -848,6 +900,19 @@ This is equivalent (but faster than) the following.
df2 = df.copy()
df.apply(lambda x, y: x.where(x>0,y), y=df['A'])

.. versionadded:: 0.18.1

Where can accept a callable as condition and ``other`` arguments. The function must
be with one argument (the calling Series or DataFrame) and that returns valid output
as condition and ``other`` argument.

.. ipython:: python

df3 = pd.DataFrame({'A': [1, 2, 3],
'B': [4, 5, 6],
'C': [7, 8, 9]})
df3.where(lambda x: x > 4, lambda x: x + 10)

**mask**

``mask`` is the inverse boolean operation of ``where``.
Expand Down
62 changes: 62 additions & 0 deletions doc/source/whatsnew/v0.18.1.txt
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,8 @@ Highlights include:
- ``pd.to_datetime()`` has gained the ability to assemble dates from a ``DataFrame``, see :ref:`here <whatsnew_0181.enhancements.assembling>`
- Custom business hour offset, see :ref:`here <whatsnew_0181.enhancements.custombusinesshour>`.
- Many bug fixes in the handling of ``sparse``, see :ref:`here <whatsnew_0181.sparse>`
- Method chaining improvements, see :ref:`here <whatsnew_0181.enhancements.method_chain>`.


.. contents:: What's new in v0.18.1
:local:
Expand Down Expand Up @@ -94,6 +96,66 @@ Now you can do:

df.groupby('group').resample('1D').ffill()

.. _whatsnew_0181.enhancements.method_chain:

Method chaininng improvements
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

The following methods / indexers now accept ``callable``. It is intended to make
these more useful in method chains, see :ref:`Selection By Callable <indexing.callable>`.
(:issue:`11485`, :issue:`12533`)

- ``.where()`` and ``.mask()``
- ``.loc[]``, ``iloc[]`` and ``.ix[]``
- ``[]`` indexing

``.where()`` and ``.mask()``
""""""""""""""""""""""""""""

These can accept a callable as condition and ``other``
arguments.

.. ipython:: python

df = pd.DataFrame({'A': [1, 2, 3],
'B': [4, 5, 6],
'C': [7, 8, 9]})
df.where(lambda x: x > 4, lambda x: x + 10)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would add this example somewhere in the where docs as well (http://pandas-docs.github.io/pandas-docs-travis/indexing.html#the-where-method-and-masking)


``.loc[]``, ``.iloc[]``, ``.ix[]``
""""""""""""""""""""""""""""""""""

These can accept a callable, and tuple of callable as a slicer. The callable
can return valid ``bool`` indexer or anything which is valid for these indexer's input.

.. ipython:: python

# callable returns bool indexer
df.loc[lambda x: x.A >= 2, lambda x: x.sum() > 10]

# callable returns list of labels
df.loc[lambda x: [1, 2], lambda x: ['A', 'B']]

``[]`` indexing
"""""""""""""""

Finally, you can use a callable in ``[]`` indexing of Series, DataFrame and Panel.
The callable must return valid input for ``[]`` indexing depending on its
class and index type.

.. ipython:: python

df[lambda x: 'A']

Using these methods / indexers, you can chain data selection operations
without using temporary variable.

.. ipython:: python

bb = pd.read_csv('data/baseball.csv', index_col='id')
(bb.groupby(['year', 'team']).sum()
.loc[lambda df: df.r > 100])

.. _whatsnew_0181.partial_string_indexing:

Partial string indexing on ``DateTimeIndex`` when part of a ``MultiIndex``
Expand Down
10 changes: 10 additions & 0 deletions pandas/core/common.py
Original file line number Diff line number Diff line change
Expand Up @@ -1843,6 +1843,16 @@ def _get_callable_name(obj):
return None


def _apply_if_callable(maybe_callable, obj, **kwargs):
"""
Evaluate possibly callable input using obj and kwargs if it is callable,
otherwise return as it is
"""
if callable(maybe_callable):
return maybe_callable(obj, **kwargs)
return maybe_callable


_string_dtypes = frozenset(map(_get_dtype_from_object, (compat.binary_type,
compat.text_type)))

Expand Down
16 changes: 9 additions & 7 deletions pandas/core/frame.py
Original file line number Diff line number Diff line change
Expand Up @@ -1970,6 +1970,7 @@ def iget_value(self, i, j):
return self.iat[i, j]

def __getitem__(self, key):
key = com._apply_if_callable(key, self)

# shortcut if we are an actual column
is_mi_columns = isinstance(self.columns, MultiIndex)
Expand Down Expand Up @@ -2138,6 +2139,9 @@ def query(self, expr, inplace=False, **kwargs):
>>> df.query('a > b')
>>> df[df.a > df.b] # same result as the previous expression
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add an Example as well

"""
if not isinstance(expr, compat.string_types):
msg = "expr must be a string to be evaluated, {0} given"
raise ValueError(msg.format(type(expr)))
kwargs['level'] = kwargs.pop('level', 0) + 1
kwargs['target'] = None
res = self.eval(expr, **kwargs)
Expand Down Expand Up @@ -2336,6 +2340,7 @@ def _box_col_values(self, values, items):
name=items, fastpath=True)

def __setitem__(self, key, value):
key = com._apply_if_callable(key, self)

# see if we can slice the rows
indexer = convert_to_index_sliceable(self, key)
Expand Down Expand Up @@ -2454,8 +2459,9 @@ def assign(self, **kwargs):
kwargs : keyword, value pairs
keywords are the column names. If the values are
callable, they are computed on the DataFrame and
assigned to the new columns. If the values are
not callable, (e.g. a Series, scalar, or array),
assigned to the new columns. The callable must not
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a versionadded tag here (and other doc-strings)

Copy link
Member Author

@sinhrks sinhrks Apr 29, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assign impl is unchanged.

change input DataFrame (though pandas doesn't check it).
If the values are not callable, (e.g. a Series, scalar, or array),
they are simply assigned.

Returns
Expand Down Expand Up @@ -2513,11 +2519,7 @@ def assign(self, **kwargs):
# do all calculations first...
results = {}
for k, v in kwargs.items():

if callable(v):
results[k] = v(data)
else:
results[k] = v
results[k] = com._apply_if_callable(v, data)

# ... and then assign
for k, v in sorted(results.items()):
Expand Down
28 changes: 26 additions & 2 deletions pandas/core/generic.py
Original file line number Diff line number Diff line change
Expand Up @@ -4283,8 +4283,26 @@ def _align_series(self, other, join='outer', axis=None, level=None,

Parameters
----------
cond : boolean %(klass)s or array
other : scalar or %(klass)s
cond : boolean %(klass)s, array or callable
If cond is callable, it is computed on the %(klass)s and
should return boolean %(klass)s or array.
The callable must not change input %(klass)s
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you know a format to use versionadded as a part of an argument description? Adding it under cond looks cond argument itself is added in the specified version.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can do

<blank line>
<indent here>versionadded: 0.18.1
<blank line>
<indent here>comment here

I think

@jorisvandenbossche

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added tags, and currently rendered like below.

2016-04-30 1 55 14

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok looks fine. thxs. going to merge.

(though pandas doesn't check it).

.. versionadded:: 0.18.1

A callable can be used as cond.

other : scalar, %(klass)s, or callable
If other is callable, it is computed on the %(klass)s and
should return scalar or %(klass)s.
The callable must not change input %(klass)s
(though pandas doesn't check it).

.. versionadded:: 0.18.1

A callable can be used as other.

inplace : boolean, default False
Whether to perform the operation in place on the data
axis : alignment axis if needed, default None
Expand All @@ -4304,6 +4322,9 @@ def _align_series(self, other, join='outer', axis=None, level=None,
def where(self, cond, other=np.nan, inplace=False, axis=None, level=None,
try_cast=False, raise_on_error=True):

cond = com._apply_if_callable(cond, self)
other = com._apply_if_callable(other, self)

if isinstance(cond, NDFrame):
cond, _ = cond.align(self, join='right', broadcast_axis=1)
else:
Expand Down Expand Up @@ -4461,6 +4482,9 @@ def where(self, cond, other=np.nan, inplace=False, axis=None, level=None,
@Appender(_shared_docs['where'] % dict(_shared_doc_kwargs, cond="False"))
def mask(self, cond, other=np.nan, inplace=False, axis=None, level=None,
try_cast=False, raise_on_error=True):

cond = com._apply_if_callable(cond, self)

return self.where(~cond, other=other, inplace=inplace, axis=axis,
level=level, try_cast=try_cast,
raise_on_error=raise_on_error)
Expand Down
Loading