@@ -234,14 +234,18 @@ the rows, applying our ``integrate_f_typed``, and putting this in the zeros arra
234
234
235
235
.. code-block :: ipython
236
236
237
- In [4]: %timeit apply_integrate_f(df['a'].values, df['b'].values, df['N'].values)
237
+ In [4]: %timeit apply_integrate_f(df['a'].to_numpy(),
238
+ df['b'].to_numpy(),
239
+ df['N'].to_numpy())
238
240
1000 loops, best of 3: 1.25 ms per loop
239
241
240
242
We've gotten another big improvement. Let's check again where the time is spent:
241
243
242
244
.. ipython :: python
243
245
244
- % prun - l 4 apply_integrate_f(df[' a' ].values, df[' b' ].values, df[' N' ].values)
246
+ % prun - l 4 apply_integrate_f(df[' a' ].to_numpy(),
247
+ df[' b' ].to_numpy(),
248
+ df[' N' ].to_numpy())
245
249
246
250
As one might expect, the majority of the time is now spent in ``apply_integrate_f ``,
247
251
so if we wanted to make anymore efficiencies we must continue to concentrate our
@@ -286,7 +290,9 @@ advanced Cython techniques:
286
290
287
291
.. code-block :: ipython
288
292
289
- In [4]: %timeit apply_integrate_f_wrap(df['a'].values, df['b'].values, df['N'].values)
293
+ In [4]: %timeit apply_integrate_f_wrap(df['a'].to_numpy(),
294
+ df['b'].to_numpy(),
295
+ df['N'].to_numpy())
290
296
1000 loops, best of 3: 987 us per loop
291
297
292
298
Even faster, with the caveat that a bug in our Cython code (an off-by-one error,
@@ -349,8 +355,9 @@ take the plain Python code from above and annotate with the ``@jit`` decorator.
349
355
350
356
351
357
def compute_numba (df ):
352
- result = apply_integrate_f_numba(df[' a' ].values, df[' b' ].values,
353
- df[' N' ].values)
358
+ result = apply_integrate_f_numba(df[' a' ].to_numpy(),
359
+ df[' b' ].to_numpy(),
360
+ df[' N' ].to_numpy())
354
361
return pd.Series(result, index = df.index, name = ' result' )
355
362
356
363
Note that we directly pass NumPy arrays to the Numba function. ``compute_numba `` is just a wrapper that provides a
@@ -394,7 +401,7 @@ Consider the following toy example of doubling each observation:
394
401
1000 loops, best of 3: 233 us per loop
395
402
396
403
# Custom function with numba
397
- In [7]: %timeit (df['col1_doubled'] = double_every_value_withnumba(df.a.values )
404
+ In [7]: %timeit (df['col1_doubled'] = double_every_value_withnumba(df.a.to_numpy() )
398
405
1000 loops, best of 3: 145 us per loop
399
406
400
407
Caveats
0 commit comments