Skip to content

ENH: Add support for multiple conditions assign statement #46285

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
ELHoussineT opened this issue Mar 8, 2022 · 3 comments
Closed

ENH: Add support for multiple conditions assign statement #46285

ELHoussineT opened this issue Mar 8, 2022 · 3 comments
Labels
Enhancement Needs Triage Issue that has not been reviewed by a pandas team member

Comments

@ELHoussineT
Copy link

ELHoussineT commented Mar 8, 2022

Is your feature request related to a problem?

Assigning values to a column in a DataFrame based on different conditions for each value has been missing for a while; it is true that you can achieve this by using np.select and comprehensive lists but it is believed that this functionality is fundamental enough in data analysis that it should be covered by Pandas without the need for "workarounds".

Describe the solution you'd like

Allow DataFrame.assign to accept multiple condition dict that contains the desired values to be assigned as keys and bool Series, array-like, or callable as values. If the value to the multiple condition dict is callable, it is computed on the DataFrame and should return boolean Series or array. Cases not covered will be assigned with None. This is based onnp.select.

Example expected usage:

>>> df = pd.DataFrame({'a': [1, 2, 3], 'b': [5, 4, 6]})
>>> df
   a  b
0  1  5
1  2  4
2  3  6
>>> df.assign(
...     new_column={
...         "case 1": lambda x: (x.a < 2) & (x.b == 5),
...         "case 2": lambda x: (x.a == 2) & (x.b < 5),
...         "case 3": lambda x: (x.a > 2) & (x.b > 5),
...     }
... )
   a  b new_column
0  1  5     case 1
1  2  4     case 2
2  3  6     case 3

API breaking implications

N/A

Describe alternatives you've considered

Just using np.select as follows:

df = pd.DataFrame({'a': [1, 2, 3]})

multiple_conditions_dict = {
    "less than 2": lambda x: x < 2,
    "equals 2": lambda x: x == 2,
    "bigger than 2": lambda x: x > 2,
}

df.assign(
    a_status=np.select(
        [i(df) for i in multiple_conditions_dict.values()],
        multiple_conditions_dict.keys(),
    )
)

However, based on the community feedback in Stackoverflow and other platforms, the lack of "out-of-the-box" Pandas support for such a fundamental operation in data analysis can be cumbersome.

Additional context

N/A

@ELHoussineT ELHoussineT added Enhancement Needs Triage Issue that has not been reviewed by a pandas team member labels Mar 8, 2022
@attack68
Copy link
Contributor

attack68 commented Mar 9, 2022

This is already available, albeit using the apply method:

>>> df = df = pd.DataFrame({'a': [1, 2, 3], 'b': [5, 4, 6]})
>>> def assign(s):
        if s["a"] < 2 and s["b"] == 5:
            return "case 1"
        elif s["a"] == 2 and s["b"] < 5:
            return "case 2"
        else:
            return "case 3"
>>> df["new_column"] = df.apply(assign, axis=1)
>>> df
   a  b new_column
0  1  5     case 1
1  2  4     case 2
2  3  6     case 3

Note that the use of the apply method here allows for much more generalist design and user-defined-functions, that the suggested DataFrame.assign suggestion which is narrow focused in comparison.

@samukweku
Copy link
Contributor

#39154 addresses this. Not sure what the consensus is though

@mroeschke
Copy link
Member

Closing as a duplicate or covered by #39154

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement Needs Triage Issue that has not been reviewed by a pandas team member
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants