-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Document Tips for Debugging C Extensions #35100
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 11 commits
4fa85c6
a792267
61654dd
aa8ad1f
ca29cfd
177ad89
e699c5b
7166d52
0f5dd5e
83762ba
644acef
5ca314f
d80688c
1c67b2d
aa324e5
3b8de2e
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,69 @@ | ||
.. _debugging_c_extensions: | ||
|
||
{{ header }} | ||
|
||
====================== | ||
Debugging C extensions | ||
====================== | ||
|
||
Pandas uses select C extensions for high performance IO operations. In case you need to debug segfaults or general issues with those extensions, the following steps may be helpful. These steps are geared towards using lldb as a debugger, though the steps for gdb will be similar. | ||
|
||
First, be sure to compile the extensions with the appropriate flags to generate debug symbols and remove optimizations. This can be achieved as follows: | ||
|
||
.. code-block:: sh | ||
|
||
python setup.py build_ext --inplace -j4 --with-debugging-symbols | ||
|
||
Using a debugger | ||
================ | ||
|
||
You can create a script that hits the extension module you are looking to debug and place it in the project root. Thereafter launch a Python process under lldb: | ||
|
||
.. code-block:: sh | ||
|
||
lldb python | ||
|
||
If desired, set breakpoints at various file locations using the below syntax: | ||
|
||
.. code-block:: sh | ||
|
||
breakpoint set --file pandas/_libs/src/ujson/python/objToJSON.c --line 1547 | ||
|
||
At this point you may get *WARNING: Unable to resolve breakpoint to any actual locations.*. If you have not yet executed anything it is possible that this module has not been loaded into memory, which is why the location cannot be resolved. You can simply ignore for now as it will bind when we actually execute code. | ||
|
||
Finally go ahead and execute your script: | ||
|
||
.. code-block:: sh | ||
|
||
run <the_script>.py | ||
WillAyd marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
Code execution will halt at the breakpoint defined or at the occurance of any segfault. LLDB's `GDB to LLDB command map <https://lldb.llvm.org/use/map.html>`_ provides a listing of debugger command that you can execute using either debugger. | ||
|
||
Another option to execute the entire test suite under the debugger would be to run the following: | ||
|
||
.. code-block:: sh | ||
|
||
lldb -- python -m pytest | ||
|
||
Or for gdb | ||
|
||
.. code-block:: sh | ||
|
||
gdb --args python -m pytest | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can you try There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yea looks like There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Are you using pyenv for development or Conda? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. you've already gone above and beyond helping me debug this; ill spend some more time on this and ping you if i find anything new There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. All good - I think this is generally helpful to hash out together so thanks for the input. It looks like this might be specific to pyenv and how it manages the python executable: https://stackoverflow.com/questions/48141135/cannot-start-dbg-on-my-python-c-extension There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. is the command from my previous comment specific to my case, or relevant to the document? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm trying to avoid adding too much detail here since this issue is more of a pyenv thing than a debugger issue |
||
|
||
Once the process launches, simply type ``run`` and the test suite will begin, stopping at any segmentation fault that may occur. | ||
|
||
Checking memory leaks with valgrind | ||
=================================== | ||
|
||
You can use `Valgrind <https://www.valgrind.org>`_ to check for and log memory leaks in extensions. For instance, to check for a memory leak in a test from the suite you can run: | ||
|
||
.. code-block:: sh | ||
|
||
PYTHONMALLOC=malloc valgrind --leak-check=yes --track-origins=yes --log-file=valgrind-log.txt python -m pytest <path_to_a_test> | ||
|
||
Note that code execution under valgrind will take much longer than usual. While you can run valgrind against extensions compiled with any optimization level, it is suggested to have optimizations turned off from compiled extensions to reduce the amount of false positives. The ``--with-debugging-symbols`` flag passed during package setup will do this for you automatically. | ||
|
||
.. note:: | ||
|
||
For best results, you should run use a Python installation configured with Valgrind support (--with-valgrind) |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -16,6 +16,7 @@ Development | |
code_style | ||
maintaining | ||
internals | ||
debugging_extensions | ||
extending | ||
developer | ||
policies | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -414,18 +414,16 @@ def run(self): | |
|
||
# ---------------------------------------------------------------------- | ||
# Preparation of compiler arguments | ||
|
||
debugging_symbols_requested = "--with-debugging-symbols" in sys.argv | ||
if debugging_symbols_requested: | ||
sys.argv.remove("--with-debugging-symbols") | ||
|
||
|
||
if sys.byteorder == "big": | ||
endian_macro = [("__BIG_ENDIAN__", "1")] | ||
else: | ||
endian_macro = [("__LITTLE_ENDIAN__", "1")] | ||
|
||
|
||
debugging_symbols_requested = "--with-debugging-symbols" in sys.argv | ||
if debugging_symbols_requested: | ||
sys.argv.remove("--with-debugging-symbols") | ||
|
||
if is_platform_windows(): | ||
extra_compile_args = [] | ||
extra_link_args = [] | ||
|
@@ -435,8 +433,15 @@ def run(self): | |
else: | ||
extra_compile_args = ["-Werror"] | ||
extra_link_args = [] | ||
if debugging_symbols_requested: | ||
extra_compile_args.append("-g") | ||
if not debugging_symbols_requested: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I guess Python by default (at least locally and looking at some of the CI builds) includes the According to SO we can override that by appending here, which might help reduce file size by removing those symbols: https://stackoverflow.com/a/37952343/621736 I can also remove this from this PR if deemed too orthogonal. IIRC @xhochy or @TomAugspurger may have experience with stripping debug symbols from built distributions There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Multibuild may do this by default now? I don't recall. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, multibuild includes this nowadays. |
||
# Strip debugging symbols (included by default) | ||
extra_compile_args.append("-g0") | ||
else: | ||
# TODO: these should override the defaults provided by Python | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. distutils adds NDEBUG and -O3 by default it seems without a feasible way to remove those compilation flags. Appending these at the end should override those according to the SO link shared above There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't recommend building with There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. gdb suggests turning off optimizations: https://sourceware.org/gdb/onlinedocs/gdb/Optimized-Code.html There are certainly exceptions but I think as a general rule (especially for people that aren't super well versed in debugging the extensions yet) that no optimizations will be easier to follow There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, the debug information with There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sounds good. I think this is off by default with -O0 per the docs but doesn't hurt to add again https://gcc.gnu.org/onlinedocs/gcc-3.4.4/gcc/Optimize-Options.html |
||
# by being appended to end, but would ideally replace altogether | ||
extra_compile_args.append("-UNDEBUG") | ||
extra_compile_args.append("-O0") | ||
extra_compile_args.append("-fno-omit-frame-pointer") | ||
|
||
# Build for at least macOS 10.9 when compiling on a 10.9 system or above, | ||
# overriding CPython distuitls behaviour which is to target the version that | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there a reference on lldb (link / how to install?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have only used on macOS where it comes bundled with the Xcode tools. Not sure about other systems
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On my ubuntu machine it's not available (while
gdb
is). So I would at least link to some web page about it if we want to recommend it over gdb, I suppose most readers won't know what lldb is (I also didn't).it seems you can install it from conda-forge.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be clear I don't recommend one over the other - should just use whatever comes with your build system.
I'll try to reword
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you settle on a wording here?