Skip to content

DOC: Replace .values with .to_numpy() in enhancingperf #26313

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
May 8, 2019
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 13 additions & 6 deletions doc/source/user_guide/enhancingperf.rst
Original file line number Diff line number Diff line change
Expand Up @@ -234,14 +234,18 @@ the rows, applying our ``integrate_f_typed``, and putting this in the zeros arra

.. code-block:: ipython

In [4]: %timeit apply_integrate_f(df['a'].values, df['b'].values, df['N'].values)
In [4]: %timeit apply_integrate_f(df['a'].to_numpy(),
df['b'].to_numpy(),
df['N'].to_numpy())
1000 loops, best of 3: 1.25 ms per loop

We've gotten another big improvement. Let's check again where the time is spent:

.. ipython:: python

%prun -l 4 apply_integrate_f(df['a'].values, df['b'].values, df['N'].values)
%prun -l 4 apply_integrate_f(df['a'].to_numpy(),
df['b'].to_numpy(),
df['N'].to_numpy())

As one might expect, the majority of the time is now spent in ``apply_integrate_f``,
so if we wanted to make anymore efficiencies we must continue to concentrate our
Expand Down Expand Up @@ -286,7 +290,9 @@ advanced Cython techniques:

.. code-block:: ipython

In [4]: %timeit apply_integrate_f_wrap(df['a'].values, df['b'].values, df['N'].values)
In [4]: %timeit apply_integrate_f_wrap(df['a'].to_numpy(),
df['b'].to_numpy(),
df['N'].to_numpy())
1000 loops, best of 3: 987 us per loop

Even faster, with the caveat that a bug in our Cython code (an off-by-one error,
Expand Down Expand Up @@ -349,8 +355,9 @@ take the plain Python code from above and annotate with the ``@jit`` decorator.


def compute_numba(df):
result = apply_integrate_f_numba(df['a'].values, df['b'].values,
df['N'].values)
result = apply_integrate_f_numba(df['a'].to_numpy(),
df['b'].to_numpy(),
df['N'].to_numpy())
return pd.Series(result, index=df.index, name='result')

Note that we directly pass NumPy arrays to the Numba function. ``compute_numba`` is just a wrapper that provides a
Expand Down Expand Up @@ -394,7 +401,7 @@ Consider the following toy example of doubling each observation:
1000 loops, best of 3: 233 us per loop

# Custom function with numba
In [7]: %timeit (df['col1_doubled'] = double_every_value_withnumba(df.a.values)
In [7]: %timeit (df['col1_doubled'] = double_every_value_withnumba(df.a.to_numpy())
1000 loops, best of 3: 145 us per loop

Caveats
Expand Down