Skip to content

ENH: Implement cummax and cummin in _accumulate() for ordered Categorical arrays #58360

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 18 commits into from
Apr 23, 2024

Conversation

bdwzhangumich
Copy link
Contributor

We implemented the operations cummax and cummin in the _accumulate() function for the Categorical array class. The implementation includes a check that the Categorical array is ordered and is based on other implementations of _accumulate(). Tests for the implementation have also been written and included.

Coauthored with @xiangchris

import numpy as np


def _cum_func(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you just include all this logic in pandas/core/arrays/categorical.py

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved the logic into pandas/core/arrays/categorical.py

@mroeschke mroeschke added Categorical Categorical Data Type Reduction Operations sum, mean, min, max, etc. labels Apr 22, 2024
@bdwzhangumich bdwzhangumich requested a review from mroeschke April 22, 2024 19:26
["a", np.nan, "b", "a", "c"],
dtype=pd.CategoricalDtype(list(order), ordered=True),
)
result = getattr(ser, method)(skipna=True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you use pytest.mark.parametrize with skipna=True|False and the expected result?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a pytest.mark.parametrize to test_cummax_cummin_ordered_categorical_nan with skipna and expected data

Comment on lines 205 to 207
pd.Series(
["a", np.nan, "b", "b", "c"],
dtype=pd.CategoricalDtype(list(order), ordered=True),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you assign this to an expected variable?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assigned all expected series to expected variables

…ametrize to test_cummax_cummin_ordered_categorical_nan with skipna and expected data
@bdwzhangumich bdwzhangumich requested a review from mroeschke April 23, 2024 19:13
@mroeschke mroeschke added this to the 3.0 milestone Apr 23, 2024
@mroeschke mroeschke merged commit 9d5c88e into pandas-dev:main Apr 23, 2024
45 of 46 checks passed
@mroeschke
Copy link
Member

Thanks @bdwzhangumich

pmhatre1 pushed a commit to pmhatre1/pandas-pmhatre1 that referenced this pull request May 7, 2024
…ical arrays (pandas-dev#58360)

* Added tests with and without np.nan

* Added tests for cummin and cummax

* Fixed series tests expected series, rewrote categorical arrays to use pd.Categorical

* Fixed cat not defined error and misspelling

* Implement _accumulate for Categorical

* fixed misspellings in tests

* fixed expected categories on tests

* Updated whatsnew

* Update doc/source/whatsnew/v3.0.0.rst

Co-authored-by: Matthew Roeschke <[email protected]>

* Removed testing for _accumulate.

* Moved categorical_accumulations.py logic to categorical.py

* Assigned expected results to expected variable; Added pytest.mark.parametrize to test_cummax_cummin_ordered_categorical_nan with skipna and expected data

---------

Co-authored-by: Christopher Xiang <[email protected]>
Co-authored-by: Matthew Roeschke <[email protected]>
Co-authored-by: Chris Xiang <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Categorical Categorical Data Type Reduction Operations sum, mean, min, max, etc.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: Can't compute cummin/cummax on ordered categorical
3 participants