Skip to content

Commit 325218f

Browse files
ngoldbaummejrs
authored andcommitted
docs: Expand docs on when and why allow_threads is necessary (#4767)
* Expand docs on when and why allow_threads is necessary * spelling * simplify example a little * use less indirection in the example * Update guide/src/parallelism.md * Add note about the GIL preventing parallelism * Update guide/src/free-threading.md Co-authored-by: Bruno Kolenbrander <[email protected]> * pared down text about need to use with_gil * rearrange slightly --------- Co-authored-by: Bruno Kolenbrander <[email protected]>
1 parent 869a25b commit 325218f

File tree

2 files changed

+89
-12
lines changed

2 files changed

+89
-12
lines changed

guide/src/free-threading.md

+31-11
Original file line numberDiff line numberDiff line change
@@ -156,20 +156,40 @@ freethreaded build, holding a `'py` lifetime means only that the thread is
156156
currently attached to the Python interpreter -- other threads can be
157157
simultaneously interacting with the interpreter.
158158

159-
The main reason for obtaining a `'py` lifetime is to interact with Python
159+
You still need to obtain a `'py` lifetime is to interact with Python
160160
objects or call into the CPython C API. If you are not yet attached to the
161161
Python runtime, you can register a thread using the [`Python::with_gil`]
162162
function. Threads created via the Python [`threading`] module do not not need to
163-
do this, but all other OS threads that interact with the Python runtime must
164-
explicitly attach using `with_gil` and obtain a `'py` liftime.
165-
166-
Since there is no GIL in the free-threaded build, releasing the GIL for
167-
long-running tasks is no longer necessary to ensure other threads run, but you
168-
should still detach from the interpreter runtime using [`Python::allow_threads`]
169-
when doing long-running tasks that do not require the CPython runtime. The
170-
garbage collector can only run if all threads are detached from the runtime (in
171-
a stop-the-world state), so detaching from the runtime allows freeing unused
172-
memory.
163+
do this, and pyo3 will handle setting up the [`Python<'py>`] token when CPython
164+
calls into your extension.
165+
166+
### Global synchronization events can cause hangs and deadlocks
167+
168+
The free-threaded build triggers global synchronization events in the following
169+
situations:
170+
171+
* During garbage collection in order to get a globally consistent view of
172+
reference counts and references between objects
173+
* In Python 3.13, when the first background thread is started in
174+
order to mark certain objects as immortal
175+
* When either `sys.settrace` or `sys.setprofile` are called in order to
176+
instrument running code objects and threads
177+
* Before `os.fork()` is called.
178+
179+
This is a non-exhaustive list and there may be other situations in future Python
180+
versions that can trigger global synchronization events.
181+
182+
This means that you should detach from the interpreter runtime using
183+
[`Python::allow_threads`] in exactly the same situations as you should detach
184+
from the runtime in the GIL-enabled build: when doing long-running tasks that do
185+
not require the CPython runtime or when doing any task that needs to re-attach
186+
to the runtime (see the [guide
187+
section](parallelism.md#sharing-python-objects-between-rust-threads) that
188+
covers this). In the former case, you would observe a hang on threads that are
189+
waiting on the long-running task to complete, and in the latter case you would
190+
see a deadlock while a thread tries to attach after the runtime triggers a
191+
global synchronization event, but the spawning thread prevents the
192+
synchronization event from completing.
173193

174194
### Exceptions and panics for multithreaded access of mutable `pyclass` instances
175195

guide/src/parallelism.md

+58-1
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# Parallelism
22

3-
CPython has the infamous [Global Interpreter Lock](https://docs.python.org/3/glossary.html#term-global-interpreter-lock), which prevents several threads from executing Python bytecode in parallel. This makes threading in Python a bad fit for [CPU-bound](https://en.wikipedia.org/wiki/CPU-bound) tasks and often forces developers to accept the overhead of multiprocessing.
3+
CPython has the infamous [Global Interpreter Lock](https://docs.python.org/3/glossary.html#term-global-interpreter-lock) (GIL), which prevents several threads from executing Python bytecode in parallel. This makes threading in Python a bad fit for [CPU-bound](https://en.wikipedia.org/wiki/CPU-bound) tasks and often forces developers to accept the overhead of multiprocessing. There is an experimental "free-threaded" version of CPython 3.13 that does not have a GIL, see the PyO3 docs on [free-threaded Python](./free-threading.md) for more information about that.
44

55
In PyO3 parallelism can be easily achieved in Rust-only code. Let's take a look at our [word-count](https://github.com/PyO3/pyo3/blob/main/examples/word-count/src/lib.rs) example, where we have a `search` function that utilizes the [rayon](https://github.com/rayon-rs/rayon) crate to count words in parallel.
66
```rust,no_run
@@ -117,4 +117,61 @@ test_word_count_python_sequential 27.3985 (15.82) 45.452
117117

118118
You can see that the Python threaded version is not much slower than the Rust sequential version, which means compared to an execution on a single CPU core the speed has doubled.
119119

120+
## Sharing Python objects between Rust threads
121+
122+
In the example above we made a Python interface to a low-level rust function,
123+
and then leveraged the python `threading` module to run the low-level function
124+
in parallel. It is also possible to spawn threads in Rust that acquire the GIL
125+
and operate on Python objects. However, care must be taken to avoid writing code
126+
that deadlocks with the GIL in these cases.
127+
128+
* Note: This example is meant to illustrate how to drop and re-acquire the GIL
129+
to avoid creating deadlocks. Unless the spawned threads subsequently
130+
release the GIL or you are using the free-threaded build of CPython, you
131+
will not see any speedups due to multi-threaded parallelism using `rayon`
132+
to parallelize code that acquires and holds the GIL for the entire
133+
execution of the spawned thread.
134+
135+
In the example below, we share a `Vec` of User ID objects defined using the
136+
`pyclass` macro and spawn threads to process the collection of data into a `Vec`
137+
of booleans based on a predicate using a rayon parallel iterator:
138+
139+
```rust,no_run
140+
use pyo3::prelude::*;
141+
142+
// These traits let us use int_par_iter and map
143+
use rayon::iter::{IntoParallelRefIterator, ParallelIterator};
144+
145+
#[pyclass]
146+
struct UserID {
147+
id: i64,
148+
}
149+
150+
let allowed_ids: Vec<bool> = Python::with_gil(|outer_py| {
151+
let instances: Vec<Py<UserID>> = (0..10).map(|x| Py::new(outer_py, UserID { id: x }).unwrap()).collect();
152+
outer_py.allow_threads(|| {
153+
instances.par_iter().map(|instance| {
154+
Python::with_gil(|inner_py| {
155+
instance.borrow(inner_py).id > 5
156+
})
157+
}).collect()
158+
})
159+
});
160+
assert!(allowed_ids.into_iter().filter(|b| *b).count() == 4);
161+
```
162+
163+
It's important to note that there is an `outer_py` GIL lifetime token as well as
164+
an `inner_py` token. Sharing GIL lifetime tokens between threads is not allowed
165+
and threads must individually acquire the GIL to access data wrapped by a python
166+
object.
167+
168+
It's also important to see that this example uses [`Python::allow_threads`] to
169+
wrap the code that spawns OS threads via `rayon`. If this example didn't use
170+
`allow_threads`, a rayon worker thread would block on acquiring the GIL while a
171+
thread that owns the GIL spins forever waiting for the result of the rayon
172+
thread. Calling `allow_threads` allows the GIL to be released in the thread
173+
collecting the results from the worker threads. You should always call
174+
`allow_threads` in situations that spawn worker threads, but especially so in
175+
cases where worker threads need to acquire the GIL, to prevent deadlocks.
176+
120177
[`Python::allow_threads`]: {{#PYO3_DOCS_URL}}/pyo3/marker/struct.Python.html#method.allow_threads

0 commit comments

Comments
 (0)