Skip to content

PERF: nanops #43311

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Aug 31, 2021
Merged

PERF: nanops #43311

merged 2 commits into from
Aug 31, 2021

Conversation

jbrockmendel
Copy link
Member

from asv_bench.benchmarks.stat_ops import *
self = FrameOps()
self.setup("skew", "float", 0)

%timeit self.time_op("skew", "float", 0)
3.22 ms ± 235 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)  # <- master
2.06 ms ± 71.1 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)  # <- PR

%timeit self.time_op("skew", "int", 0)
3.81 ms ± 294 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)  # <- master
2.3 ms ± 79.7 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)  # <- PR

%timeit self.time_op("kurt", "float", 0)
2.79 ms ± 153 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)  # <- master
2.2 ms ± 85.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)  # <- PR

%timeit self.time_op("kurt", "int", 0)
3.29 ms ± 131 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)  # <- master
2.35 ms ± 69.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)  # <- PR

%timeit self.time_op("prod", "float", 0)
2.9 ms ± 116 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)  # <- master
2.23 ms ± 40.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)  # <- PR

%timeit self.time_op("sum", "float", 0)
2.9 ms ± 194 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)  # <- master
2.14 ms ± 130 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)  # <- PR

@alimcmaster1 alimcmaster1 added this to the 1.4 milestone Aug 31, 2021
@alimcmaster1 alimcmaster1 added the Performance Memory or execution speed performance label Aug 31, 2021
@alimcmaster1
Copy link
Member

LGTM. Question how come we only apply this to certain nanops. Why not nanstd for example?

@jbrockmendel
Copy link
Member Author

Question how come we only apply this to certain nanops

These are the only ones that showed a big difference in the asvs.

@jreback jreback merged commit 1633e32 into pandas-dev:master Aug 31, 2021
@jbrockmendel jbrockmendel deleted the perf-reductions branch August 31, 2021 18:32
feefladder pushed a commit to feefladder/pandas that referenced this pull request Sep 7, 2021
@jorisvandenbossche
Copy link
Member

This seems to give a slowdown in the nan* methods in case the axis 1 is smaller compared to axis 0:

values = np.random.randn(1000000, 4)

In [9]: %timeit pd.core.nanops.nansum(values, axis=1, skipna=True)
47.9 ms ± 554 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)  # <-- pandas 1.3
18.4 s ± 808 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)  # <-- master

This is a gigantic difference which I noticed in a case for the ArrayManager. But also for BlockManager with relatively wide dataframe it gives a slowdown:

values = np.random.randn(1000, 4)
df = pd.DataFrame(values).copy()

In [12]: %timeit df.sum(axis=1)
138 µs ± 21.9 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)  # <-- pandas 1.3
1.83 ms ± 80.6 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)  # <-- master

Since the performance improvement in the examples in the top post were only small (compared to the slowdowns showed above), I would maybe either 1) revert the optimization or 2) add some threshold for the shape (eg only take this custom path if values.shape[1] < (values.shape[0] / 10))

@jbrockmendel
Copy link
Member Author

add some threshold for the shape

seems reasonable

@jorisvandenbossche
Copy link
Member

jorisvandenbossche commented Nov 21, 2021

Some very rough comparisons:

values = np.random.randn(10000, 100)

In [5]: %timeit pd.core.nanops.nansum(values, axis=1, skipna=True)
2.05 ms ± 110 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)  # <-- pandas 1.3
174 ms ± 15.6 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)  # <-- master

values = np.random.randn(1000, 1000)

In [7]: %timeit pd.core.nanops.nansum(values, axis=1, skipna=True)
2.02 ms ± 121 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)  # <-- pandas 1.3
17 ms ± 2.5 ms per loop (mean ± std. dev. of 7 runs, 100 loops each)  # <-- master

values = np.random.randn(100, 10000)

In [9]: %timeit pd.core.nanops.nansum(values, axis=1, skipna=True)
1.87 ms ± 248 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)  # <-- pandas 1.3
2.4 ms ± 38.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)  # <-- master

values = np.random.randn(10, 100000)

In [11]: %timeit pd.core.nanops.nansum(values, axis=1, skipna=True)
2.15 ms ± 197 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)  # <-- pandas 1.3
1.23 ms ± 139 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)  # <-- master

Based on this, something shape[1] / shape[0] > 1000 might be a good criterion.

@jorisvandenbossche
Copy link
Member

Added that in #44566

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Performance Memory or execution speed performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants