Skip to content

BUG/PERF: sparse min/max don't densify #43527

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Sep 12, 2021

Conversation

mzeitlin11
Copy link
Member

  • tests added / passed
  • Ensure all linting tests pass, see here for how to run them
  • whatsnew entry

This now follows the structure of other SparseArray reductions like sum more closely, though there is some added complexity here to avoid taking the min/max of an empty array, which would raise when we want to return the corresponding missing value instead.

Benchmarks:

       before           after         ratio
     [5deec13d]       [f3eea313]
     <master>         <perf_sparse_min_max>
-      4.05±0.3ms       11.6±0.3μs     0.00  sparse.MinMax.time_min_max('max', 0.0)
-      4.49±0.4ms       11.6±0.2μs     0.00  sparse.MinMax.time_min_max('min', 0.0)
-      4.62±0.7ms       10.6±0.2μs     0.00  sparse.MinMax.time_min_max('max', nan)
-      5.37±0.7ms       10.3±0.3μs     0.00  sparse.MinMax.time_min_max('min', nan)

@mzeitlin11 mzeitlin11 added Bug Performance Memory or execution speed performance Reduction Operations sum, mean, min, max, etc. Sparse Sparse Data Type labels Sep 12, 2021
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

small comment, ping on green.

func = max if kind == "max" else min
return func(sp_min_max, self.fill_value)
else:
return sp_min_max
else:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might has well just do elif else

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, updated

@jreback jreback added this to the 1.4 milestone Sep 12, 2021
@jreback jreback merged commit d47f845 into pandas-dev:master Sep 12, 2021
@jreback
Copy link
Contributor

jreback commented Sep 12, 2021

thanks @mzeitlin11

@mzeitlin11 mzeitlin11 deleted the perf_sparse_min_max branch September 12, 2021 19:19
AlexeyGy pushed a commit to AlexeyGy/pandas that referenced this pull request Sep 13, 2021
AlexeyGy pushed a commit to AlexeyGy/pandas that referenced this pull request Sep 13, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Performance Memory or execution speed performance Reduction Operations sum, mean, min, max, etc. Sparse Sparse Data Type
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants