|
1 | 1 | # Parallelism
|
2 | 2 |
|
3 |
| -CPython has the infamous [Global Interpreter Lock](https://docs.python.org/3/glossary.html#term-global-interpreter-lock), which prevents several threads from executing Python bytecode in parallel. This makes threading in Python a bad fit for [CPU-bound](https://en.wikipedia.org/wiki/CPU-bound) tasks and often forces developers to accept the overhead of multiprocessing. |
| 3 | +CPython has the infamous [Global Interpreter Lock](https://docs.python.org/3/glossary.html#term-global-interpreter-lock) (GIL), which prevents several threads from executing Python bytecode in parallel. This makes threading in Python a bad fit for [CPU-bound](https://en.wikipedia.org/wiki/CPU-bound) tasks and often forces developers to accept the overhead of multiprocessing. There is an experimental "free-threaded" version of CPython 3.13 that does not have a GIL, see the PyO3 docs on [free-threaded Python](./free-threading.md) for more information about that. |
4 | 4 |
|
5 | 5 | In PyO3 parallelism can be easily achieved in Rust-only code. Let's take a look at our [word-count](https://github.com/PyO3/pyo3/blob/main/examples/word-count/src/lib.rs) example, where we have a `search` function that utilizes the [rayon](https://github.com/rayon-rs/rayon) crate to count words in parallel.
|
6 | 6 | ```rust,no_run
|
@@ -117,4 +117,61 @@ test_word_count_python_sequential 27.3985 (15.82) 45.452
|
117 | 117 |
|
118 | 118 | You can see that the Python threaded version is not much slower than the Rust sequential version, which means compared to an execution on a single CPU core the speed has doubled.
|
119 | 119 |
|
| 120 | +## Sharing Python objects between Rust threads |
| 121 | + |
| 122 | +In the example above we made a Python interface to a low-level rust function, |
| 123 | +and then leveraged the python `threading` module to run the low-level function |
| 124 | +in parallel. It is also possible to spawn threads in Rust that acquire the GIL |
| 125 | +and operate on Python objects. However, care must be taken to avoid writing code |
| 126 | +that deadlocks with the GIL in these cases. |
| 127 | + |
| 128 | +* Note: This example is meant to illustrate how to drop and re-acquire the GIL |
| 129 | + to avoid creating deadlocks. Unless the spawned threads subsequently |
| 130 | + release the GIL or you are using the free-threaded build of CPython, you |
| 131 | + will not see any speedups due to multi-threaded parallelism using `rayon` |
| 132 | + to parallelize code that acquires and holds the GIL for the entire |
| 133 | + execution of the spawned thread. |
| 134 | + |
| 135 | +In the example below, we share a `Vec` of User ID objects defined using the |
| 136 | +`pyclass` macro and spawn threads to process the collection of data into a `Vec` |
| 137 | +of booleans based on a predicate using a rayon parallel iterator: |
| 138 | + |
| 139 | +```rust,no_run |
| 140 | +use pyo3::prelude::*; |
| 141 | +
|
| 142 | +// These traits let us use int_par_iter and map |
| 143 | +use rayon::iter::{IntoParallelRefIterator, ParallelIterator}; |
| 144 | +
|
| 145 | +#[pyclass] |
| 146 | +struct UserID { |
| 147 | + id: i64, |
| 148 | +} |
| 149 | +
|
| 150 | +let allowed_ids: Vec<bool> = Python::with_gil(|outer_py| { |
| 151 | + let instances: Vec<Py<UserID>> = (0..10).map(|x| Py::new(outer_py, UserID { id: x }).unwrap()).collect(); |
| 152 | + outer_py.allow_threads(|| { |
| 153 | + instances.par_iter().map(|instance| { |
| 154 | + Python::with_gil(|inner_py| { |
| 155 | + instance.borrow(inner_py).id > 5 |
| 156 | + }) |
| 157 | + }).collect() |
| 158 | + }) |
| 159 | +}); |
| 160 | +assert!(allowed_ids.into_iter().filter(|b| *b).count() == 4); |
| 161 | +``` |
| 162 | + |
| 163 | +It's important to note that there is an `outer_py` GIL lifetime token as well as |
| 164 | +an `inner_py` token. Sharing GIL lifetime tokens between threads is not allowed |
| 165 | +and threads must individually acquire the GIL to access data wrapped by a python |
| 166 | +object. |
| 167 | + |
| 168 | +It's also important to see that this example uses [`Python::allow_threads`] to |
| 169 | +wrap the code that spawns OS threads via `rayon`. If this example didn't use |
| 170 | +`allow_threads`, a rayon worker thread would block on acquiring the GIL while a |
| 171 | +thread that owns the GIL spins forever waiting for the result of the rayon |
| 172 | +thread. Calling `allow_threads` allows the GIL to be released in the thread |
| 173 | +collecting the results from the worker threads. You should always call |
| 174 | +`allow_threads` in situations that spawn worker threads, but especially so in |
| 175 | +cases where worker threads need to acquire the GIL, to prevent deadlocks. |
| 176 | + |
120 | 177 | [`Python::allow_threads`]: {{#PYO3_DOCS_URL}}/pyo3/marker/struct.Python.html#method.allow_threads
|
0 commit comments