Skip to content

ENH: Setup ASAN in CI #52990

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
3 tasks
WillAyd opened this issue Apr 28, 2023 · 5 comments · Fixed by #55102
Closed
3 tasks

ENH: Setup ASAN in CI #52990

WillAyd opened this issue Apr 28, 2023 · 5 comments · Fixed by #55102
Labels
CI Continuous Integration Enhancement

Comments

@WillAyd
Copy link
Member

WillAyd commented Apr 28, 2023

Feature Type

  • Adding new functionality to pandas

  • Changing existing functionality in pandas

  • Removing existing functionality in pandas

Problem Description

We have no way of automatically detecting memory leaks in our library

Feature Description

ASAN can be used to detect memory leaks with a relatively low cost. The following code can show how to build pandas with ASAN and find leaks. Note that fast_unwind_on_malloc=0 severely slows down execution at the cost of a more informative traceback. This also reports leaks that likely cannot be controlled by pandas.

CC=clang CXX=clang++ CFLAGS="-fsanitize=address -fno-omit-frame-pointer -shared-libasan" LDSHARED="clang -shared" python setup.py build_ext --inplace -j8 --with-debugging-symbols
ASAN_OPTIONS="fast_unwind_on_malloc=0" LD_PRELOAD=$(clang -print-file-name=libclang_rt.asan-x86_64.so) python -m pytest pandas/tests/io/json/ 2> leaks.txt

This produces the following output, which contains items like:

...

Indirect leak of 24 byte(s) in 6 object(s) allocated from:
    #0 0x7fa3374d2c7e in __interceptor_malloc build-llvm/tools/clang/stage2-bins/runtimes/runtimes-bins/compiler-rt/lib/asan/asan_malloc_linux.cpp:69:3
    #1 0x7fa2c96590e4 in traced_malloc /home/willayd/clones/pandas/pandas/_libs/src/klib/khash_python.h:27:18
    #2 0x7fa2c96c793f in kh_resize_uint64 /home/willayd/clones/pandas/pandas/_libs/src/klib/khash.h:712:1
    #3 0x7fa2c96e80d3 in __pyx_pf_6pandas_5_libs_9hashtable_15UInt64HashTable___cinit__ /home/willayd/clones/pandas/pandas/_libs/hashtable.c:33343:3
    #4 0x7fa2c96e7861 in __pyx_pw_6pandas_5_libs_9hashtable_15UInt64HashTable_1__cinit__ /home/willayd/clones/pandas/pandas/_libs/hashtable.c:33275:13
    #5 0x7fa2c96c92f0 in __pyx_tp_new_6pandas_5_libs_9hashtable_UInt64HashTable /home/willayd/clones/pandas/pandas/_libs/hashtable.c:158300:7
    #6 0x55f9503681f5 in type_call /usr/local/src/conda/python-3.11.3/Objects/typeobject.c:1100:11
    #7 0x7fa2c5703998 in __Pyx_PyObject_Call /home/willayd/clones/pandas/pandas/_libs/index.c:82754:14
    #8 0x7fa2c5739d3b in __Pyx__PyObject_CallOneArg /home/willayd/clones/pandas/pandas/_libs/index.c:83070:14
    #9 0x7fa2c57370e9 in __Pyx_PyObject_CallOneArg /home/willayd/clones/pandas/pandas/_libs/index.c:83089:12
    #10 0x7fa2c571f6ab in __pyx_f_6pandas_5_libs_5index_12UInt64Engine__make_hash_table /home/willayd/clones/pandas/pandas/_libs/index.c:26319:89
    #11 0x7fa2c570fa8f in __pyx_f_6pandas_5_libs_5index_11IndexEngine__ensure_mapping_populated /home/willayd/clones/pandas/pandas/_libs/index.c:9532:17
    #12 0x7fa2c570cd7d in __pyx_f_6pandas_5_libs_5index_11IndexEngine__do_unique_check /home/willayd/clones/pandas/pandas/_libs/index.c:8653:15
    #13 0x7fa2c575e77a in __pyx_pf_6pandas_5_libs_5index_11IndexEngine_9is_unique___get__ /home/willayd/clones/pandas/pandas/_libs/index.c:8583:17
    #14 0x7fa2c575e67c in __pyx_pw_6pandas_5_libs_5index_11IndexEngine_9is_unique_1__get__ /home/willayd/clones/pandas/pandas/_libs/index.c:8549:13
    #15 0x7fa2c575e578 in __pyx_getprop_6pandas_5_libs_5index_11IndexEngine_is_unique /home/willayd/clones/pandas/pandas/_libs/index.c:75082:10
    #16 0x55f95038854d in _PyObject_GenericGetAttrWithDict /usr/local/src/conda/python-3.11.3/Objects/object.c:1278:19
    #17 0x55f95036adcf in PyObject_GenericGetAttr /usr/local/src/conda/python-3.11.3/Objects/object.c:1368:12
    #18 0x55f95036adcf in PyObject_GetAttr /usr/local/src/conda/python-3.11.3/Objects/object.c:916:19
    #19 0x55f950374b61 in _PyEval_EvalFrameDefault /usr/local/src/conda/python-3.11.3/Python/ceval.c:3465:29
    #20 0x55f95039acd2 in _PyEval_EvalFrame /usr/local/src/conda/python-3.11.3/Include/internal/pycore_ceval.h:73:16
    #21 0x55f95039acd2 in _PyEval_Vector /usr/local/src/conda/python-3.11.3/Python/ceval.c:6438:24
    #22 0x55f95039acd2 in _PyFunction_Vectorcall /usr/local/src/conda/python-3.11.3/Objects/call.c:393:16
    #23 0x7fa2c8db64d8 in __Pyx_PyObject_Call /home/willayd/clones/pandas/pandas/_libs/properties.c:5508:14
    #24 0x7fa2c8db684b in __Pyx__PyObject_CallOneArg /home/willayd/clones/pandas/pandas/_libs/properties.c:5576:14
    #25 0x7fa2c8db6309 in __Pyx_PyObject_CallOneArg /home/willayd/clones/pandas/pandas/_libs/properties.c:5595:12
    #26 0x7fa2c8db7f67 in __pyx_pf_6pandas_5_libs_10properties_14CachedProperty_2__get__ /home/willayd/clones/pandas/pandas/_libs/properties.c:1972:93
    #27 0x7fa2c8db6d2c in __pyx_pw_6pandas_5_libs_10properties_14CachedProperty_3__get__ /home/willayd/clones/pandas/pandas/_libs/properties.c:1723:13
    #28 0x7fa2c8db3b28 in __pyx_tp_descr_get_6pandas_5_libs_10properties_CachedProperty /home/willayd/clones/pandas/pandas/_libs/properties.c:4149:7
    #29 0x55f95038854d in _PyObject_GenericGetAttrWithDict /usr/local/src/conda/python-3.11.3/Objects/object.c:1278:19
    #30 0x55f95036adcf in PyObject_GenericGetAttr /usr/local/src/conda/python-3.11.3/Objects/object.c:1368:12
    #31 0x55f95036adcf in PyObject_GetAttr /usr/local/src/conda/python-3.11.3/Objects/object.c:916:19

SUMMARY: AddressSanitizer: 9878368 byte(s) leaked in 8924 allocation(s).

So from running the JSON tests ASAN thinks we leak 9878368 bytes with 24 being leaked by an interaction between properties.pyx -> index.pyx -> hashtable.pyx -> khash

Many of the arguments pieced together to make the above work are taken from:

google/sanitizers/issues/918
https://stackoverflow.com/questions/48833176/get-location-of-libasan-from-gcc-clang
https://clang.llvm.org/docs/AddressSanitizer.html

leaks.txt

Alternative Solutions

Valgrind is another great tool for detecting memory leaks (amongst other things), but has larger overhead compared to the sanitizers, which may make it unsuitable for CI

https://github.com/google/sanitizers/wiki/AddressSanitizerComparisonOfMemoryTools

Additional Context

cc @lithomas1

@WillAyd WillAyd added Enhancement CI Continuous Integration labels Apr 28, 2023
@WillAyd
Copy link
Member Author

WillAyd commented Apr 28, 2023

For anyone that looks at the logs forgot to mention I intentionally introduce a memory leak in the JSON code by commenting out the below implementation:

void NpyArr_iterEnd(JSOBJ obj, JSONTypeContext *tc) {
    NpyArrContext *npyarr = GET_TC(tc)->npyarr;

    /*
    if (npyarr) {
        NpyArr_freeItemValue(obj, tc);
        PyObject_Free(npyarr);
    }
    */
}

So our memory leaks aren't as bad as the attached log would indicate

@WillAyd
Copy link
Member Author

WillAyd commented Apr 28, 2023

The biggest leak ASAN found is below (truncated for output):

Direct leak of 1610180 byte(s) in 31 object(s) allocated from:
    #0 0x7fc0846d2c7e in __interceptor_malloc build-llvm/tools/clang/stage2-bins/runtimes/runtimes-bins/compiler-rt/lib/asan/asan_malloc_linux.cpp:69:3
    #1 0x56105bf4cabf in _PyMem_RawMalloc /usr/local/src/conda/python-3.11.3/Objects/obmalloc.c:101:12
    #2 0x56105bf4cabf in PyMem_RawMalloc /usr/local/src/conda/python-3.11.3/Objects/obmalloc.c:586:12
    #3 0x56105bf4cabf in _PyObject_Malloc /usr/local/src/conda/python-3.11.3/Objects/obmalloc.c:2003:11
    #4 0x56105bf4cabf in _PyObject_Malloc /usr/local/src/conda/python-3.11.3/Objects/obmalloc.c:1996:1
    #5 0x56105bf4cabf in PyObject_Malloc /usr/local/src/conda/python-3.11.3/Objects/obmalloc.c:712:12
    #6 0x56105bf4cabf in PyUnicode_New /usr/local/src/conda/python-3.11.3/Objects/unicodeobject.c:1425:24
    #7 0x56105bf4cabf in unicode_decode_utf8 /usr/local/src/conda/python-3.11.3/Objects/unicodeobject.c:5117:19
    #8 0x7fc010c5ccba in objToJSON /home/willayd/clones/pandas/pandas/_libs/src/ujson/python/objToJSON.c:2130:14
    #9 0x56105bf873c6 in cfunction_call /usr/local/src/conda/python-3.11.3/Objects/methodobject.c:542:18

That line in objToJSON points to:

PyObject *objToJSON(PyObject *Py_UNUSED(self), PyObject *args,
                    PyObject *kwargs) {
    ...
    newobj = PyUnicode_FromString(ret);

    if (ret != buffer) {
        encoder->free(ret);
    }

    return newobj;
}

I think PyUnicode_FromString copies the buffer provided as the construction, so we might be missing a free on the ret argument. Could be an easy bug fix if someone wants to dig deeper

@lithomas1
Copy link
Member

I'll take a look. Last time I tried Asan on macOS, I ended up having some issues (it wasn't catching leaks properly).

@jbrockmendel
Copy link
Member

Is the idea to enable this during pytest runs or a separate job? My recollection of valgrind is its more of a "leave running over the weekend" thing.

@WillAyd
Copy link
Member Author

WillAyd commented Apr 29, 2023

I think during pytest. Yea Valgrind can definitely slow your program down, but ASAN doesn't add nearly as much overhead.

Arrow might have ASAN set up in their CI; I know they do in arrow-adbc and its pretty cool, though that library has a very small test suite at the moment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI Continuous Integration Enhancement
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants