Skip to content

PERF: lazify consolidation check #32224

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Feb 26, 2020

Conversation

jbrockmendel
Copy link
Member

The benchmark I'm using for this (and upcoming related PRs) is based on the asv that is most affected by removing fast_apply (see #32086).

import numpy as np
from pandas import *
%load_ext line_profiler


def get_df():
    N = 10 ** 4
    labels = np.random.randint(0, 2000, size=N)
    labels2 = np.random.randint(0, 3, size=N)
    df = DataFrame(
        {
            "key": labels,
            "key2": labels2,
            "value1": np.random.randn(N),
            "value2": ["foo", "bar", "baz", "qux"] * (N // 4),
        }
    )
    return df

df = get_df()

gb = df.groupby("key")

%prun -s cumulative gb.apply(lambda x: 1)

If we disable fast_apply on master, this gives:

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    0.278    0.278 groupby.py:701(apply)
        1    0.000    0.000    0.278    0.278 groupby.py:750(_python_apply_general)
        1    0.009    0.009    0.275    0.275 ops.py:151(apply)
     1987    0.003    0.000    0.257    0.000 ops.py:858(__iter__)
     1986    0.003    0.000    0.251    0.000 ops.py:889(_chop)
     1986    0.003    0.000    0.247    0.000 indexing.py:814(__getitem__)
     1986    0.001    0.000    0.243    0.000 indexing.py:1462(_getitem_axis)
     1986    0.003    0.000    0.242    0.000 indexing.py:1488(_get_slice_axis)
     1986    0.007    0.000    0.230    0.000 generic.py:3470(_slice)
     1986    0.008    0.000    0.203    0.000 managers.py:713(get_slice)
     1987    0.005    0.000    0.129    0.000 managers.py:125(__init__)
     1987    0.003    0.000    0.060    0.000 managers.py:634(_consolidate_check)
     1987    0.026    0.000    0.059    0.000 managers.py:215(_rebuild_blknos_and_blklocs)
     1987    0.003    0.000    0.056    0.000 managers.py:635(<listcomp>)
     5961    0.015    0.000    0.053    0.000 blocks.py:335(ftype)

If we disable fast_apply on this PR:

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    0.198    0.198 groupby.py:701(apply)
        1    0.000    0.000    0.198    0.198 groupby.py:750(_python_apply_general)
        1    0.008    0.008    0.195    0.195 ops.py:151(apply)
     1979    0.002    0.000    0.176    0.000 ops.py:903(__iter__)
     1978    0.002    0.000    0.172    0.000 ops.py:934(_chop)
     1978    0.003    0.000    0.169    0.000 indexing.py:814(__getitem__)
     1978    0.001    0.000    0.165    0.000 indexing.py:1462(_getitem_axis)
     1978    0.003    0.000    0.164    0.000 indexing.py:1488(_get_slice_axis)
     1978    0.006    0.000    0.153    0.000 generic.py:3470(_slice)
     1978    0.007    0.000    0.129    0.000 managers.py:713(get_slice)
     1980    0.004    0.000    0.061    0.000 managers.py:125(__init__)
     1980    0.021    0.000    0.052    0.000 managers.py:215(_rebuild_blknos_and_blklocs)
     1978    0.002    0.000    0.048    0.000 managers.py:723(<listcomp>)
     5934    0.010    0.000    0.045    0.000 blocks.py:310(getitem_block)
     5942    0.003    0.000    0.031    0.000 blocks.py:275(make_block_same_class)

We save almost 30% by lazifying the consolidation check and consolidate on _slice.

@WillAyd
Copy link
Member

WillAyd commented Feb 25, 2020

Where does this leave us in the interim? I think a slight preference to just include along the PR(s) it is supposed to support

@jbrockmendel
Copy link
Member Author

Where does this leave us in the interim? I think a slight preference to just include along the PR(s) it is supposed to support

It leaves us with improved perf for DataFrame slicing. The other optimizations im looking at are mostly orthogonal to this, so this makes for a reasonable scope. Also none of it gets even close to fast_apply, so this is a for-its-own-sake thing.

@jbrockmendel jbrockmendel added the Performance Memory or execution speed performance label Feb 25, 2020
@jreback jreback added this to the 1.1 milestone Feb 26, 2020
@jreback jreback merged commit a152c30 into pandas-dev:master Feb 26, 2020
@jreback
Copy link
Contributor

jreback commented Feb 26, 2020

sure

@jbrockmendel jbrockmendel deleted the just-consolidate-less branch February 26, 2020 02:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Performance Memory or execution speed performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants