{{ header }}
Pandas uses select C extensions for high performance IO operations. In case you need to debug segfaults or general issues with those extensions, the following steps may be helpful. These steps are geared towards using lldb as a debugger, though the steps for gdb will be similar.
First, be sure to compile the extensions with the appropriate flags to generate debug symbols and remove optimizations. This can be achieved as follows:
python setup.py build_ext --inplace -j4 --with-debugging-symbols
Next you can create a script that hits the extension module you are looking to debug and place it in the project root. Thereafter launch a Python process under lldb:
lldb python
If desired, set breakpoints at various file locations using the below syntax:
breakpoint set --file pandas/_libs/src/ujson/python/objToJSON.c --line 1547
At this point you may get WARNING: Unable to resolve breakpoint to any actual locations.. If you have not yet executed anything it is possible that this module has not been loaded into memory, which is why the location cannot be resolved. You can simply ignore for now as it will bind when we actually execute code.
Finally go ahead and execute your script:
run <the_script>.py
Code execution will halt at the breakpoint defined or at the occurance of any segfault. LLDB's GDB to LLDB command map provides a listing of debugger command that you can execute using either debugger.
Another option to execute the entire test suite under the debugger would be to run the following:
lldb -- python -m pytest
Or for gdb
gdb --args python -m pytest
Once the process launches, simply type run
and the test suite will begin, stopping at any segmentation fault that may occur.