Skip to content

Commit 9271d25

Browse files
authored
Overhauled extension debugging guide (#52504)
1 parent fe415f5 commit 9271d25

File tree

1 file changed

+5
-121
lines changed

1 file changed

+5
-121
lines changed

doc/source/development/debugging_extensions.rst

+5-121
Original file line numberDiff line numberDiff line change
@@ -6,126 +6,10 @@
66
Debugging C extensions
77
======================
88

9-
Pandas uses select C extensions for high performance IO operations. In case you need to debug segfaults or general issues with those extensions, the following steps may be helpful.
9+
Pandas uses Cython and C/C++ `extension modules <https://docs.python.org/3/extending/extending.html>`_ to optimize performance. Unfortunately, the standard Python debugger does not allow you to step into these extensions. Cython extensions can be debugged with the `Cython debugger <https://docs.cython.org/en/latest/src/userguide/debugging.html>`_ and C/C++ extensions can be debugged using the tools shipped with your platform's compiler.
1010

11-
First, be sure to compile the extensions with the appropriate flags to generate debug symbols and remove optimizations. This can be achieved as follows:
11+
For Python developers with limited or no C/C++ experience this can seem a daunting task. Core developer Will Ayd has written a 3 part blog series to help guide you from the standard Python debugger into these other tools:
1212

13-
.. code-block:: sh
14-
15-
python setup.py build_ext --inplace -j4 --with-debugging-symbols
16-
17-
Using a debugger
18-
================
19-
20-
Assuming you are on a Unix-like operating system, you can use either lldb or gdb to debug. The choice between either is largely dependent on your compilation toolchain - typically you would use lldb if using clang and gdb if using gcc. For macOS users, please note that ``gcc`` is on modern systems an alias for ``clang``, so if using Xcode you usually opt for lldb. Regardless of which debugger you choose, please refer to your operating systems instructions on how to install.
21-
22-
After installing a debugger you can create a script that hits the extension module you are looking to debug. For demonstration purposes, let's assume you have a script called ``debug_testing.py`` with the following contents:
23-
24-
.. code-block:: python
25-
26-
import pandas as pd
27-
28-
pd.DataFrame([[1, 2]]).to_json()
29-
30-
Place the ``debug_testing.py`` script in the project root and launch a Python process under your debugger. If using lldb:
31-
32-
.. code-block:: sh
33-
34-
lldb python
35-
36-
If using gdb:
37-
38-
.. code-block:: sh
39-
40-
gdb python
41-
42-
Before executing our script, let's set a breakpoint in our JSON serializer in its entry function called ``objToJSON``. The lldb syntax would look as follows:
43-
44-
.. code-block:: sh
45-
46-
breakpoint set --name objToJSON
47-
48-
Similarly for gdb:
49-
50-
.. code-block:: sh
51-
52-
break objToJSON
53-
54-
.. note::
55-
56-
You may get a warning that this breakpoint cannot be resolved in lldb. gdb may give a similar warning and prompt you to make the breakpoint on a future library load, which you should say yes to. This should only happen on the very first invocation as the module you wish to debug has not yet been loaded into memory.
57-
58-
Now go ahead and execute your script:
59-
60-
.. code-block:: sh
61-
62-
run <the_script>.py
63-
64-
Code execution will halt at the breakpoint defined or at the occurrence of any segfault. LLDB's `GDB to LLDB command map <https://lldb.llvm.org/use/map.html>`_ provides a listing of debugger command that you can execute using either debugger.
65-
66-
Another option to execute the entire test suite under lldb would be to run the following:
67-
68-
.. code-block:: sh
69-
70-
lldb -- python -m pytest
71-
72-
Or for gdb
73-
74-
.. code-block:: sh
75-
76-
gdb --args python -m pytest
77-
78-
Once the process launches, simply type ``run`` and the test suite will begin, stopping at any segmentation fault that may occur.
79-
80-
Improve debugger printing
81-
=========================
82-
83-
By default your debug will simply print the type and memory address of a PyObject. Assuming we passed a list containing ``["a", "b"]`` as an argument to a Cython-generated function with parameter ``obj``, debugging that object would look as follows:
84-
85-
.. code-block:: sh
86-
87-
(gdb) p __pyx_v_obj
88-
$1 = (PyObject *) 0x5555558b91e0
89-
90-
Dereferencing this will yield the standard PyObject struct members of the object, which provides some more visibility
91-
92-
.. code-block:: sh
93-
94-
(gdb) p *__pyx_v_obj
95-
$2 = {ob_refcnt = 1, ob_type = 0x5555558b91e0 <PyList_Type>}
96-
97-
If you are using gdb, CPython provides an extension that prints out more useful information about the object you are inspecting. The extension can be found in `cpython/Tools/gdb/libpython.py <https://github.com/python/cpython/blob/main/Tools/gdb/libpython.py>`_; for best results be sure to use the gdb extension from the CPython branch that matches the version of your interpreter.
98-
99-
To activate the extension you will need to execute ``source <path_to_cpython_source>/Tools/gdb/libpython.py`` from an actively-running gdb session. After loading you will get more detailed information about the Python object you are inspecting.
100-
101-
.. code-block:: sh
102-
103-
(gdb) p __pyx_v_obj
104-
$3 = ['a', 'b']
105-
106-
If you do not wish to explicitly source this file on every gdb run, you can alternately add it as a start up command to your `gdbinit <https://sourceware.org/gdb/onlinedocs/gdb/gdbinit-man.html>`_ file.
107-
108-
Checking memory leaks with valgrind
109-
===================================
110-
111-
You can use `Valgrind <https://valgrind.org/>`_ to check for and log memory leaks in extensions. For instance, to check for a memory leak in a test from the suite you can run:
112-
113-
.. code-block:: sh
114-
115-
PYTHONMALLOC=malloc valgrind --leak-check=yes --track-origins=yes --log-file=valgrind-log.txt python -m pytest <path_to_a_test>
116-
117-
Note that code execution under valgrind will take much longer than usual. While you can run valgrind against extensions compiled with any optimization level, it is suggested to have optimizations turned off from compiled extensions to reduce the amount of false positives. The ``--with-debugging-symbols`` flag passed during package setup will do this for you automatically.
118-
119-
.. note::
120-
121-
For best results, you should run use a Python installation configured with Valgrind support (--with-valgrind)
122-
123-
124-
Easier code navigation
125-
======================
126-
127-
Generating a ``compile_commands.json`` file may make it easier to navigate the C extensions, as this allows your code editor to list references, jump to definitions, etc... To make this work with setuptools you can use `Bear <https://github.com/rizsotto/Bear>`_.
128-
129-
.. code-block:: sh
130-
131-
bear -- python setup.py build_ext --inplace -j4 --with-debugging-symbols
13+
1. `Fundamental Python Debugging Part 1 - Python <https://willayd.com/fundamental-python-debugging-part-1-python.html>`_
14+
2. `Fundamental Python Debugging Part 2 - Python Extensions <https://willayd.com/fundamental-python-debugging-part-2-python-extensions.html>`_
15+
3. `Fundamental Python Debugging Part 3 - Cython Extensions <https://willayd.com/fundamental-python-debugging-part-3-cython-extensions.html>`_

0 commit comments

Comments
 (0)