Skip to content

PERF: passthrough mask to take_1d if already known #44666

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

jorisvandenbossche
Copy link
Member

In take_1d (if allow_fill=True), the mask gets calculated based one indexer == -1. But in the cases where this function is used, we either actually already calculated this mask in advance, or can calculate it once instead of multiple times again in take_1d (repeated calls with the same mask), which improves the performance in both cases.

Showing with two benchmarks where ArrayManager was quite a bit slower because of this:

Reindex.time_reindex_upcast benchmark:

import numpy as np
import pandas as pd

N = 10 ** 3
df2 = pd.DataFrame(
    {
        c: {
            0: np.random.randint(0, 2, N).astype(np.bool_),
            1: np.random.randint(0, N, N).astype(np.int16),
            2: np.random.randint(0, N, N).astype(np.int32),
            3: np.random.randint(0, N, N).astype(np.int64),
        }[np.random.randint(0, 4)]
        for c in range(N)
    }
)
idx2 = np.random.permutation(range(1200))

df2_am = df2._as_manager("array").copy()


In [3]: %timeit df2_am.reindex(idx2)
16.7 ms ± 1.72 ms per loop (mean ± std. dev. of 7 runs, 100 loops each)  # <-- master
9.73 ms ± 1.65 ms per loop (mean ± std. dev. of 7 runs, 100 loops each)  # <-- PR

Unstack.time_without_last_row benchmark:

import numpy as np
import pandas as pd

dtype = 'int'
m = 100
n = 1000

levels = np.arange(m)
index = pd.MultiIndex.from_product([levels] * 2)
columns = np.arange(n)
values = np.arange(m * m * n).reshape(m * m, n)

df = pd.DataFrame(values, index, columns)
df2 = df.iloc[:-1]
df2_am = df2._as_manager("array").copy()

In [3]: %timeit df2_am.unstack()
594 ms ± 8.33 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)  # <-- master
183 ms ± 21.2 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)  # <-- PR

@jorisvandenbossche jorisvandenbossche added ArrayManager Performance Memory or execution speed performance labels Nov 29, 2021
@jorisvandenbossche jorisvandenbossche added this to the 1.4 milestone Nov 29, 2021
@jorisvandenbossche jorisvandenbossche changed the title PERF: passthrough mask to take_1d is already known PERF: passthrough mask to take_1d if already known Nov 29, 2021
@jreback
Copy link
Contributor

jreback commented Nov 29, 2021

lgtm cc @jbrockmendel

maybe more cases that could use this (for followon for sure)

@jbrockmendel
Copy link
Member

nice perf bump. any interest in applying it to BM while you're at it

@jorisvandenbossche
Copy link
Member Author

any interest in applying it to BM while you're at it

I don't think the same optimization is directly applicable to BlockManager: 1) the one place where the Block uses take_nd directly in reindex_indexer, the mask is not precomputed (and it also wouldn't help to pre-compute, since computing it inside take_nd is as efficient, since this is done on the full 2D array), and 2) unstack doesn't use take (and in general wouldn't do repeated take calls with the same indexer, I think, since it is block-based).

@jorisvandenbossche jorisvandenbossche merged commit 3e1cfc5 into pandas-dev:master Dec 3, 2021
@jorisvandenbossche jorisvandenbossche deleted the am-perf-take-mask branch December 6, 2021 08:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Performance Memory or execution speed performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants