You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: doc/source/enhancingperf.rst
+41-2
Original file line number
Diff line number
Diff line change
@@ -307,6 +307,10 @@ Numba works by generating optimized machine code using the LLVM compiler infrast
307
307
308
308
You will need to install ``numba``. This is easy with ``conda``, by using: ``conda install numba``, see :ref:`installing using miniconda<install.miniconda>`.
309
309
310
+
.. note::
311
+
312
+
As of ``numba`` version 0.20, pandas objects cannot be passed directly to numba-compiled functions. Instead, one must pass the ``numpy`` array underlying the ``pandas`` object to the numba-compiled function as demonstrated below.
313
+
310
314
We simply take the plain python code from above and annotate with the ``@jit`` decorator.
311
315
312
316
.. code-block:: python
@@ -338,14 +342,49 @@ We simply take the plain python code from above and annotate with the ``@jit`` d
338
342
result = apply_integrate_f_numba(df['a'].values, df['b'].values, df['N'].values)
Similar to above, we directly pass ``numpy`` arrays directly to the numba function. Further
342
-
we are wrapping the results to provide a nice interface by passing/returning pandas objects.
345
+
Note that we directly pass ``numpy`` arrays to the numba function. ``compute_numba`` is just a wrapper that provides a nicer interface by passing/returning pandas objects.
343
346
344
347
.. code-block:: python
345
348
346
349
In [4]: %timeit compute_numba(df)
347
350
1000 loops, best of 3: 798 us per loop
348
351
352
+
``numba`` can also be used to write vectorized functions that do not require the user to explicitly
353
+
loop over the observations of a vector; a vectorized function will be applied to each row automatically.
354
+
Consider the following toy example of doubling each observation:
355
+
356
+
.. code-block:: python
357
+
358
+
import numba
359
+
360
+
defdouble_every_value_nonumba(x):
361
+
return x*2
362
+
363
+
@numba.vectorize
364
+
defdouble_every_value_withnumba(x):
365
+
return x*2
366
+
367
+
368
+
# Custom function without numba
369
+
In [5]: %timeit df['col1_doubled'] = df.a.apply(double_every_value_nonumba)
370
+
1000 loops, best of 3: 797 us per loop
371
+
372
+
# Standard implementation (faster than a custom function)
373
+
In [6]: %timeit df['col1_doubled'] = df.a*2
374
+
1000 loops, best of 3: 233 us per loop
375
+
376
+
# Custom function with numba
377
+
In [7]: %timeit df['col1_doubled'] = double_every_value_withnumba(df.a.values)
378
+
1000 loops, best of 3: 145 us per loop
379
+
380
+
.. note::
381
+
382
+
``numba`` will execute on any function, but can only accelerate certain classes of functions.
383
+
384
+
``numba`` is best at accelerating functions that apply numerical functions to numpy arrays. When passed a function that only uses operations it knows how to accelerate, it will execute in ``nopython`` mode.
385
+
386
+
If ``numba`` is passed a function that includes something it doesn't know how to work with -- a category that currently includes sets, lists, dictionaries, or string functions -- it will revert to ``object mode``. In ``object mode``, numba will execute but your code will not speed up significantly. If you would prefer that ``numba`` throw an error if it cannot compile a function in a way that speeds up your code, pass numba the argument ``nopython=True`` (e.g. ``@numba.jit(nopython=True)``). For more on troubleshooting ``numba`` modes, see the `numba troubleshooting page <http://numba.pydata.org/numba-doc/0.20.0/user/troubleshoot.html#the-compiled-code-is-too-slow>`__.
387
+
349
388
Read more in the `numba docs <http://numba.pydata.org/>`__.
0 commit comments