Skip to content

ENH: Add support for multiple conditions assign statement #46286

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 7 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions doc/source/whatsnew/v1.5.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,7 @@ Other enhancements
- Implemented a complex-dtype :class:`Index`, passing a complex-dtype array-like to ``pd.Index`` will now retain complex dtype instead of casting to ``object`` (:issue:`45845`)
- Improved error message in :class:`~pandas.core.window.Rolling` when ``window`` is a frequency and ``NaT`` is in the rolling axis (:issue:`46087`)
- :class:`Series` and :class:`DataFrame` with ``IntegerDtype`` now supports bitwise operations (:issue:`34463`)
- ``DataFrame.assign`` now supports multiple conditions assign statement (:issue:`46285`)
-

.. ---------------------------------------------------------------------------
Expand Down
49 changes: 43 additions & 6 deletions pandas/core/frame.py
Original file line number Diff line number Diff line change
Expand Up @@ -4468,13 +4468,25 @@ def assign(self, **kwargs) -> DataFrame:

Parameters
----------
**kwargs : dict of {str: callable or Series}
**kwargs : dict of {str: callable or Series or dict}
The column names are keywords. If the values are
callable, they are computed on the DataFrame and
assigned to the new columns. The callable must not
change input DataFrame (though pandas doesn't check it).
If the values are not callable, (e.g. a Series, scalar, or array),
they are simply assigned.
assigned to the new columns.

The value can also be a multiple condition dict that
contains the desired values to be assigned as keys
and bool Series, array-like, or callable as values
(see examples). If the value to the multiple condition
dict is callable, it is computed on the DataFrame
and should return boolean Series or array. Cases not
covered will be assigned with `None`. This is based on
`np.select`.

All callables must not change input DataFrame (though pandas
doesn't check it).

If the values are not callable or not dict, (e.g. a Series,
scalar, or array), they are simply assigned.

Returns
-------
Expand Down Expand Up @@ -4520,11 +4532,36 @@ def assign(self, **kwargs) -> DataFrame:
temp_c temp_f temp_k
Portland 17.0 62.6 290.15
Berkeley 25.0 77.0 298.15

If you want to assign a column based on multiple conditions, you can
pass a multiple conditions dict with as follows:

>>> df = pd.DataFrame({'a': [1, 2, 3], 'b': [5, 4, 6]})
>>> df
a b
0 1 5
1 2 4
2 3 6
>>> df.assign(
... new_column={
... "case 1": lambda x: (x.a < 2) & (x.b == 5),
... "case 2": lambda x: (x.a == 2) & (x.b < 5),
... "case 3": lambda x: (x.a > 2) & (x.b > 5),
... }
... )
a b new_column
0 1 5 case 1
1 2 4 case 2
2 3 6 case 3
"""
data = self.copy()

for k, v in kwargs.items():
data[k] = com.apply_if_callable(v, data)
data[k] = (
np.select([i(data) for i in v.values()], v.keys(), default=None)
if isinstance(v, dict)
else com.apply_if_callable(v, data)
)
return data

def _sanitize_column(self, value) -> ArrayLike:
Expand Down
28 changes: 28 additions & 0 deletions pandas/tests/frame/methods/test_assign.py
Original file line number Diff line number Diff line change
Expand Up @@ -82,3 +82,31 @@ def test_assign_dependent(self):
result = df.assign(C=lambda df: df.A, D=lambda df: df["A"] + df["C"])
expected = DataFrame([[1, 3, 1, 2], [2, 4, 2, 4]], columns=list("ABCD"))
tm.assert_frame_equal(result, expected)

def test_assign_mutilple_conditions_lambda(self):
df = DataFrame({"A": [1, 2, 3]})

# conditions cover all cases
result = df.assign(
A_status={
"less than 2": lambda x: x < 2,
"equals 2": lambda x: x == 2,
"bigger than 2": lambda x: x > 2,
}
)
expected = DataFrame(
{"A": [1, 2, 3], "A_status": ["less than 2", "equals 2", "bigger than 2"]}
)
tm.assert_frame_equal(result, expected)

# conditions do not cover all cases
result = df.assign(
A_status={
"less than 2": lambda x: x < 2,
"equals 2": lambda x: x == 2,
}
)
expected = DataFrame(
{"A": [1, 2, 3], "A_status": ["less than 2", "equals 2", None]}
)
tm.assert_frame_equal(result, expected)