PERF: Using Numpy C-API arange #32681

ShaharNaveh · 2020-03-13T14:39:29Z

This PR was opened as @jbrockmendel suggested (ref #32177 (comment))

Benchmarks:

Master:

In [1]: import pandas._libs.internals as internals

In [2]: %timeit internals.BlockPlacement(slice(1_000_000)).as_array
1.55 ms ± 143 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

PR:

In [1]: import pandas._libs.internals as internals

In [2]: %timeit internals.BlockPlacement(slice(1_000_000)).as_array
1.46 ms ± 3.55 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

jbrockmendel · 2020-03-13T15:46:10Z

Since the perf numbers are pretty similar, can you report a few rounds of each

ShaharNaveh · 2020-03-13T16:20:56Z

Since the perf numbers are pretty similar, can you report a few rounds of each

Sure!

In [1]: import pandas._libs.internals as internals                                                            

In [2]: %timeit internals.BlockPlacement(slice(1)).as_array                                                   
1.46 µs ± 45.7 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) # master
586 ns ± 35.4 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) # PR

In [3]: %timeit internals.BlockPlacement(slice(10)).as_array                                                  
1.65 µs ± 21.2 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) # master
832 ns ± 19.3 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) # PR

In [4]: %timeit internals.BlockPlacement(slice(100)).as_array                                                 
1.74 µs ± 14.4 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) # master
879 ns ± 12 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) # PR

In [5]: %timeit internals.BlockPlacement(slice(1_000)).as_array                                               
2.94 µs ± 105 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) # master
2.1 µs ± 77.8 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) # PR

In [6]: %timeit internals.BlockPlacement(slice(10_000)).as_array                                              
10.1 µs ± 198 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) # master
9.39 µs ± 170 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) # PR

In [7]: %timeit internals.BlockPlacement(slice(100_000)).as_array                                             
81 µs ± 1.75 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each) # master
83 µs ± 1.36 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each) # PR

In [8]: %timeit internals.BlockPlacement(slice(1_000_000)).as_array                                           
1.72 ms ± 165 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) # master
1.56 ms ± 13.1 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) # PR

jbrockmendel · 2020-03-13T16:24:33Z

thats a nice improvement for the small cases, thanks

pandas/_libs/internals.pyx

REF: pandas-dev#32681 (comment)

jbrockmendel · 2020-03-13T16:58:01Z

pandas/_libs/internals.pyx

@@ -105,7 +107,9 @@ cdef class BlockPlacement:
            Py_ssize_t start, stop, end, _
        if not self._has_array:
            start, stop, step, _ = slice_get_indices_ex(self._as_slice)
-            self._as_array = np.arange(start, stop, step, dtype=np.int64)
+            # NOTE: this is the C-optimized equivalent of
+            # np.arange(start, stop, step, dtype=np.int64)


nitpick: i like to add an extra space at the beginning of multi-line comments specifically to avoid confusing this for commented-out code. quotation marks or backticks or ... are also good

REF: pandas-dev#32681 (comment)

jbrockmendel · 2020-03-13T18:17:18Z

lgtm, thanks for accomodating my nitpicking

ShaharNaveh · 2020-03-13T18:18:35Z

@jbrockmendel I have a list of your nitpicks and requests, I try to address them when I have the time :)

WillAyd · 2020-03-13T21:47:55Z

Why does this generate anything different from Cython? Shouldn't it generate the same result?

jbrockmendel · 2020-03-13T22:34:25Z

Why does this generate anything different from Cython? Shouldn't it generate the same result?

The generated C code has to call np.arange in python-space (and lookup np.int64, and ...)

WillAyd · 2020-03-13T22:43:36Z

So doesn’t work like the built in range call then huh? Was expecting it would automatically convert to C without the Python lookup

…

Sent from my iPhone

On Mar 13, 2020, at 3:34 PM, jbrockmendel ***@***.***> wrote: Why does this generate anything different from Cython? Shouldn't it generate the same result? The generated C code has to call np.arange in python-space (and lookup np.int64, and ...) — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

jreback · 2020-03-14T15:55:56Z

thanks I agree we should not just wholesale convert calls to use the c-api, but in specific cases where we are actually returning a numpy array (as opposed to iteration), ok

PERF: Using Numpy C-API arange

193f5c0

jbrockmendel reviewed Mar 13, 2020

View reviewed changes

pandas/_libs/internals.pyx Show resolved Hide resolved

Added note

addde1e

REF: pandas-dev#32681 (comment)

jbrockmendel reviewed Mar 13, 2020

View reviewed changes

Added extra space at the beggining of the comment block

c1f6688

REF: pandas-dev#32681 (comment)

topper-123 added Performance Memory or execution speed performance Internals Related to non-user accessible pandas implementation labels Mar 14, 2020

topper-123 added this to the 1.1 milestone Mar 14, 2020

jreback merged commit 5c7a901 into pandas-dev:master Mar 14, 2020

ShaharNaveh deleted the PERF-arange branch March 18, 2020 12:47

ShaharNaveh mentioned this pull request Mar 18, 2020

PERF: Using Numpy C-API when calling np.arange #32804

Merged

5 tasks

SeeminSyed pushed a commit to CSCD01-team01/pandas that referenced this pull request Mar 22, 2020

PERF: Using Numpy C-API arange (pandas-dev#32681)

fce14ce

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PERF: Using Numpy C-API arange #32681

PERF: Using Numpy C-API arange #32681

ShaharNaveh commented Mar 13, 2020

jbrockmendel commented Mar 13, 2020

ShaharNaveh commented Mar 13, 2020

jbrockmendel commented Mar 13, 2020

jbrockmendel Mar 13, 2020

jbrockmendel commented Mar 13, 2020

ShaharNaveh commented Mar 13, 2020

WillAyd commented Mar 13, 2020

jbrockmendel commented Mar 13, 2020

WillAyd commented Mar 13, 2020 via email

jreback commented Mar 14, 2020

PERF: Using Numpy C-API arange #32681

PERF: Using Numpy C-API arange #32681

Conversation

ShaharNaveh commented Mar 13, 2020

Benchmarks:

Master:

PR:

jbrockmendel commented Mar 13, 2020

ShaharNaveh commented Mar 13, 2020

jbrockmendel commented Mar 13, 2020

jbrockmendel Mar 13, 2020

Choose a reason for hiding this comment

jbrockmendel commented Mar 13, 2020

ShaharNaveh commented Mar 13, 2020

WillAyd commented Mar 13, 2020

jbrockmendel commented Mar 13, 2020

WillAyd commented Mar 13, 2020 via email

jreback commented Mar 14, 2020