Skip to content

PERF: Series.any/all #52381

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Apr 4, 2023
Merged

Conversation

jbrockmendel
Copy link
Member

Example based on OP from #26032

import numpy as np
import pandas as pd
ser = pd.Series(np.random.randint(0, 2, 100000)).astype(bool)

In [2]: %timeit s.any(skipna=True)
5.79 µs ± 719 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)  # <- PR
8.44 µs ± 473 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)  # <- main

We'll be hard-pressed to get trim any more overhead from this, so I'm declaring this as closing #26032.

@@ -13023,3 +13023,41 @@ def _doc_params(cls):
The required number of valid values to perform the operation. If fewer than
``min_count`` non-NA values are present the result will be NA.
"""


def make_doc(name: str, ndim: int) -> str:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems only really needed for the Series case so can it be moved & simplified there?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doing it in series.py requires importing a bunch of currently-private items that the linter complains about. Also before long I'd like todo this with more of the reductions in both Series/DataFrame.

@mroeschke mroeschke added Performance Memory or execution speed performance Reduction Operations sum, mean, min, max, etc. labels Apr 4, 2023
@mroeschke mroeschke added this to the 2.1 milestone Apr 4, 2023
@mroeschke mroeschke merged commit 3ce07cb into pandas-dev:main Apr 4, 2023
@mroeschke
Copy link
Member

Thanks @jbrockmendel

@jbrockmendel jbrockmendel deleted the perf-series-reduce branch April 4, 2023 17:29
topper-123 pushed a commit to topper-123/pandas that referenced this pull request Apr 6, 2023
* PERF: Series.any/all

* mypy fixup

* different mypy i guess
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Performance Memory or execution speed performance Reduction Operations sum, mean, min, max, etc.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Series.all much slower than Series.values.all
2 participants