PERF: passthrough mask to take_1d if already known #44666

jorisvandenbossche · 2021-11-29T07:40:44Z

In take_1d (if allow_fill=True), the mask gets calculated based one indexer == -1. But in the cases where this function is used, we either actually already calculated this mask in advance, or can calculate it once instead of multiple times again in take_1d (repeated calls with the same mask), which improves the performance in both cases.

Showing with two benchmarks where ArrayManager was quite a bit slower because of this:

Reindex.time_reindex_upcast benchmark:

import numpy as np
import pandas as pd

N = 10 ** 3
df2 = pd.DataFrame(
    {
        c: {
            0: np.random.randint(0, 2, N).astype(np.bool_),
            1: np.random.randint(0, N, N).astype(np.int16),
            2: np.random.randint(0, N, N).astype(np.int32),
            3: np.random.randint(0, N, N).astype(np.int64),
        }[np.random.randint(0, 4)]
        for c in range(N)
    }
)
idx2 = np.random.permutation(range(1200))

df2_am = df2._as_manager("array").copy()


In [3]: %timeit df2_am.reindex(idx2)
16.7 ms ± 1.72 ms per loop (mean ± std. dev. of 7 runs, 100 loops each)  # <-- master
9.73 ms ± 1.65 ms per loop (mean ± std. dev. of 7 runs, 100 loops each)  # <-- PR

Unstack.time_without_last_row benchmark:

import numpy as np
import pandas as pd

dtype = 'int'
m = 100
n = 1000

levels = np.arange(m)
index = pd.MultiIndex.from_product([levels] * 2)
columns = np.arange(n)
values = np.arange(m * m * n).reshape(m * m, n)

df = pd.DataFrame(values, index, columns)
df2 = df.iloc[:-1]
df2_am = df2._as_manager("array").copy()

In [3]: %timeit df2_am.unstack()
594 ms ± 8.33 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)  # <-- master
183 ms ± 21.2 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)  # <-- PR

jreback · 2021-11-29T15:13:34Z

lgtm cc @jbrockmendel

maybe more cases that could use this (for followon for sure)

pandas/core/array_algos/take.py

jbrockmendel · 2021-11-29T15:55:18Z

nice perf bump. any interest in applying it to BM while you're at it

jorisvandenbossche · 2021-11-29T21:17:17Z

any interest in applying it to BM while you're at it

I don't think the same optimization is directly applicable to BlockManager: 1) the one place where the Block uses take_nd directly in reindex_indexer, the mask is not precomputed (and it also wouldn't help to pre-compute, since computing it inside take_nd is as efficient, since this is done on the full 2D array), and 2) unstack doesn't use take (and in general wouldn't do repeated take calls with the same indexer, I think, since it is block-based).

PERF: passthrough mask to take_1d is already known

50891d3

jorisvandenbossche added ArrayManager Performance Memory or execution speed performance labels Nov 29, 2021

jorisvandenbossche added this to the 1.4 milestone Nov 29, 2021

fixup import

c766b94

jorisvandenbossche changed the title ~~PERF: passthrough mask to take_1d is already known~~ PERF: passthrough mask to take_1d if already known Nov 29, 2021

try to please mypy

1393a35

jbrockmendel reviewed Nov 29, 2021

View reviewed changes

pandas/core/array_algos/take.py Outdated Show resolved Hide resolved

feedback

62726ad

jorisvandenbossche merged commit 3e1cfc5 into pandas-dev:master Dec 3, 2021

jorisvandenbossche deleted the am-perf-take-mask branch December 6, 2021 08:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PERF: passthrough mask to take_1d if already known #44666

PERF: passthrough mask to take_1d if already known #44666

jorisvandenbossche commented Nov 29, 2021

jreback commented Nov 29, 2021

jbrockmendel commented Nov 29, 2021

jorisvandenbossche commented Nov 29, 2021

PERF: passthrough mask to take_1d if already known #44666

PERF: passthrough mask to take_1d if already known #44666

Conversation

jorisvandenbossche commented Nov 29, 2021

jreback commented Nov 29, 2021

jbrockmendel commented Nov 29, 2021

jorisvandenbossche commented Nov 29, 2021