You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Assigning values to a column in a DataFrame based on different conditions for each value has been missing for a while; it is true that you can achieve this by using np.select and comprehensive lists but it is believed that this functionality is fundamental enough in data analysis that it should be covered by Pandas without the need for "workarounds".
Describe the solution you'd like
Allow DataFrame.assign to accept multiple condition dict that contains the desired values to be assigned as keys and bool Series, array-like, or callable as values. If the value to the multiple condition dict is callable, it is computed on the DataFrame and should return boolean Series or array. Cases not covered will be assigned with None. This is based onnp.select.
df = pd.DataFrame({'a': [1, 2, 3]})
multiple_conditions_dict = {
"less than 2": lambda x: x < 2,
"equals 2": lambda x: x == 2,
"bigger than 2": lambda x: x > 2,
}
df.assign(
a_status=np.select(
[i(df) for i in multiple_conditions_dict.values()],
multiple_conditions_dict.keys(),
)
)
However, based on the community feedback in Stackoverflow and other platforms, the lack of "out-of-the-box" Pandas support for such a fundamental operation in data analysis can be cumbersome.
Additional context
N/A
The text was updated successfully, but these errors were encountered:
Note that the use of the apply method here allows for much more generalist design and user-defined-functions, that the suggested DataFrame.assign suggestion which is narrow focused in comparison.
Is your feature request related to a problem?
Assigning values to a column in a DataFrame based on different conditions for each value has been missing for a while; it is true that you can achieve this by using
np.select
and comprehensive lists but it is believed that this functionality is fundamental enough in data analysis that it should be covered by Pandas without the need for "workarounds".Describe the solution you'd like
Allow
DataFrame.assign
to accept multiple condition dict that contains the desired values to be assigned as keys and bool Series, array-like, or callable as values. If the value to the multiple condition dict is callable, it is computed on the DataFrame and should return boolean Series or array. Cases not covered will be assigned withNone
. This is based onnp.select
.Example expected usage:
API breaking implications
N/A
Describe alternatives you've considered
Just using
np.select
as follows:However, based on the community feedback in Stackoverflow and other platforms, the lack of "out-of-the-box" Pandas support for such a fundamental operation in data analysis can be cumbersome.
Additional context
N/A
The text was updated successfully, but these errors were encountered: