Skip to content

DOC: section in indexing user guide to show use of np.where #37839

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Nov 18, 2020
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
34 changes: 34 additions & 0 deletions doc/source/user_guide/indexing.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1158,6 +1158,40 @@ Mask
s.mask(s >= 0)
df.mask(df >= 0)

.. _indexing.np_where:

Setting with enlargement conditionally using :func:`numpy`
----------------------------------------------------------

An alternative to :meth:`~pandas.DataFrame.where` is to use :func:`numpy.where`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the difference compared to using DataFrame.where ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Joris, with np.where, enlarging the dataframe by adding a column is straightforward. Also, it naturally extends to multiple conditions with np.select.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@suvayu thanks for the answer! Looking at the example now, I think the main difference is that with np.where, your x/y values to take from don't need to be a Series to start with (eg Series.where always takes the values from the calling Series where the condition is True, and so indeed you wouldn't be able to achieve the actual example here)

Combined with setting a new column, you can use it to enlarge a dataframe where the
values are determined conditionally.

Consider you have two choices to choose from in the following dataframe. And you want to
set a new column color to 'green' when the second column has 'Z'. You can do the
following:

.. ipython:: python

df = pd.DataFrame({'col1': list('ABBC'), 'col2': list('ZZXY')})
df['color'] = np.where(df['col2'] == 'Z', 'green', 'red')
df

If you have multiple conditions, you can use :func:`numpy.select` to achieve that. Say
corresponding to three conditions there are three choice of colors, with a fourth color
as a fallback, you can do the following.

.. ipython:: python

conditions = [
(df['col2'] == 'Z') & (df['col1'] == 'A'),
(df['col2'] == 'Z') & (df['col1'] == 'B'),
(df['col1'] == 'B')
]
choices = ['yellow', 'blue', 'purple']
df['color'] = np.select(conditions, choices, default='black')
df

.. _indexing.query:

The :meth:`~pandas.DataFrame.query` Method
Expand Down