Skip to content

Commit 8f32ea5

Browse files
WillAydlithomas1
andauthored
enable ASAN/UBSAN in pandas CI (#55102)
* enable ASAN/UBSAN in pandas CI * try input * try removing sanitize * try no CFLAGS * try GH string substituion * change flags in build script * quotes * update script run * single_cpu updates * asan checks for datetime funcs * try smaller config * checkpoint * bool fixup * reverts * known UB marker * Finished marking tests with known UB * dedicated CI job * identifier fix * fixes * more test skip * try quotes * simplify ci * try CFLAGS * preload args * skip single_cpu tests * wording * removed unneeded marker * float set implementations * Revert "float set implementations" This reverts commit 6266422. * change marker name * dedicated actions file * consolidated into matrix * fixup * typos * fixups * add qt? * intentional UB with verbose * disable pytest-xdist * original issue * remove UB * Revert "remove UB" This reverts commit 677da0e. * merge fixup * remove UB --------- Co-authored-by: Thomas Li <[email protected]>
1 parent 8ce6740 commit 8f32ea5

File tree

15 files changed

+88
-5
lines changed

15 files changed

+88
-5
lines changed

.github/actions/build_pandas/action.yml

+9-2
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,12 @@ inputs:
44
editable:
55
description: Whether to build pandas in editable mode (default true)
66
default: true
7+
meson_args:
8+
description: Extra flags to pass to meson
9+
required: false
10+
cflags_adds:
11+
description: Items to append to the CFLAGS variable
12+
required: false
713
runs:
814
using: composite
915
steps:
@@ -24,11 +30,12 @@ runs:
2430

2531
- name: Build Pandas
2632
run: |
33+
export CFLAGS="$CFLAGS ${{ inputs.cflags_adds }}"
2734
if [[ ${{ inputs.editable }} == "true" ]]; then
28-
pip install -e . --no-build-isolation -v --no-deps \
35+
pip install -e . --no-build-isolation -v --no-deps ${{ inputs.meson_args }} \
2936
--config-settings=setup-args="--werror"
3037
else
31-
pip install . --no-build-isolation -v --no-deps \
38+
pip install . --no-build-isolation -v --no-deps ${{ inputs.meson_args }} \
3239
--config-settings=setup-args="--werror"
3340
fi
3441
shell: bash -el {0}

.github/actions/run-tests/action.yml

+8-1
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,16 @@
11
name: Run tests and report results
2+
inputs:
3+
preload:
4+
description: Preload arguments for sanitizer
5+
required: false
6+
asan_options:
7+
description: Arguments for Address Sanitizer (ASAN)
8+
required: false
29
runs:
310
using: composite
411
steps:
512
- name: Test
6-
run: ci/run_tests.sh
13+
run: ${{ inputs.asan_options }} ${{ inputs.preload }} ci/run_tests.sh
714
shell: bash -el {0}
815

916
- name: Publish test results

.github/workflows/unit-tests.yml

+18-1
Original file line numberDiff line numberDiff line change
@@ -96,6 +96,14 @@ jobs:
9696
- name: "Pyarrow Nightly"
9797
env_file: actions-311-pyarrownightly.yaml
9898
pattern: "not slow and not network and not single_cpu"
99+
- name: "ASAN / UBSAN"
100+
env_file: actions-311-sanitizers.yaml
101+
pattern: "not slow and not network and not single_cpu and not skip_ubsan"
102+
asan_options: "ASAN_OPTIONS=detect_leaks=0"
103+
preload: LD_PRELOAD=$(gcc -print-file-name=libasan.so)
104+
meson_args: --config-settings=setup-args="-Db_sanitize=address,undefined"
105+
cflags_adds: -fno-sanitize-recover=all
106+
pytest_workers: -1 # disable pytest-xdist as it swallows stderr from ASAN
99107
fail-fast: false
100108
name: ${{ matrix.name || format('ubuntu-latest {0}', matrix.env_file) }}
101109
env:
@@ -105,7 +113,7 @@ jobs:
105113
PANDAS_COPY_ON_WRITE: ${{ matrix.pandas_copy_on_write || '0' }}
106114
PANDAS_CI: ${{ matrix.pandas_ci || '1' }}
107115
TEST_ARGS: ${{ matrix.test_args || '' }}
108-
PYTEST_WORKERS: 'auto'
116+
PYTEST_WORKERS: ${{ matrix.pytest_workers || 'auto' }}
109117
PYTEST_TARGET: ${{ matrix.pytest_target || 'pandas' }}
110118
# Clipboard tests
111119
QT_QPA_PLATFORM: offscreen
@@ -174,16 +182,25 @@ jobs:
174182
- name: Build Pandas
175183
id: build
176184
uses: ./.github/actions/build_pandas
185+
with:
186+
meson_args: ${{ matrix.meson_args }}
187+
cflags_adds: ${{ matrix.cflags_adds }}
177188

178189
- name: Test (not single_cpu)
179190
uses: ./.github/actions/run-tests
180191
if: ${{ matrix.name != 'Pypy' }}
192+
with:
193+
preload: ${{ matrix.preload }}
194+
asan_options: ${{ matrix.asan_options }}
181195
env:
182196
# Set pattern to not single_cpu if not already set
183197
PATTERN: ${{ env.PATTERN == '' && 'not single_cpu' || matrix.pattern }}
184198

185199
- name: Test (single_cpu)
186200
uses: ./.github/actions/run-tests
201+
with:
202+
preload: ${{ matrix.preload }}
203+
asan_options: ${{ matrix.asan_options }}
187204
env:
188205
PATTERN: 'single_cpu'
189206
PYTEST_WORKERS: 0

ci/deps/actions-311-sanitizers.yaml

+32
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,32 @@
1+
name: pandas-dev
2+
channels:
3+
- conda-forge
4+
dependencies:
5+
- python=3.11
6+
7+
# build dependencies
8+
- versioneer[toml]
9+
- cython>=0.29.33
10+
- meson[ninja]=1.2.1
11+
- meson-python=0.13.1
12+
13+
# test dependencies
14+
- pytest>=7.3.2
15+
- pytest-cov
16+
- pytest-xdist>=2.2.0
17+
- pytest-localserver>=0.7.1
18+
- pytest-qt>=4.2.0
19+
- boto3
20+
- hypothesis>=6.46.1
21+
- pyqt>=5.15.9
22+
23+
# required dependencies
24+
- python-dateutil
25+
- numpy<2
26+
- pytz
27+
28+
# pandas dependencies
29+
- pip
30+
31+
- pip:
32+
- "tzdata>=2022.7"

pandas/tests/frame/test_constructors.py

+2
Original file line numberDiff line numberDiff line change
@@ -3206,6 +3206,7 @@ def test_from_out_of_bounds_ns_datetime(
32063206
assert item.asm8.dtype == exp_dtype
32073207
assert dtype == exp_dtype
32083208

3209+
@pytest.mark.skip_ubsan
32093210
def test_out_of_s_bounds_datetime64(self, constructor):
32103211
scalar = np.datetime64(np.iinfo(np.int64).max, "D")
32113212
result = constructor(scalar)
@@ -3241,6 +3242,7 @@ def test_from_out_of_bounds_ns_timedelta(
32413242
assert item.asm8.dtype == exp_dtype
32423243
assert dtype == exp_dtype
32433244

3245+
@pytest.mark.skip_ubsan
32443246
@pytest.mark.parametrize("cls", [np.datetime64, np.timedelta64])
32453247
def test_out_of_s_bounds_timedelta64(self, constructor, cls):
32463248
scalar = cls(np.iinfo(np.int64).max, "D")

pandas/tests/groupby/test_cumulative.py

+1
Original file line numberDiff line numberDiff line change
@@ -60,6 +60,7 @@ def test_groupby_cumprod():
6060
tm.assert_series_equal(actual, expected)
6161

6262

63+
@pytest.mark.skip_ubsan
6364
def test_groupby_cumprod_overflow():
6465
# GH#37493 if we overflow we return garbage consistent with numpy
6566
df = DataFrame({"key": ["b"] * 4, "value": 100_000})

pandas/tests/io/parser/common/test_float.py

+9-1
Original file line numberDiff line numberDiff line change
@@ -40,7 +40,14 @@ def test_scientific_no_exponent(all_parsers_all_precisions):
4040
tm.assert_frame_equal(df_roundtrip, df)
4141

4242

43-
@pytest.mark.parametrize("neg_exp", [-617, -100000, -99999999999999999])
43+
@pytest.mark.parametrize(
44+
"neg_exp",
45+
[
46+
-617,
47+
-100000,
48+
pytest.param(-99999999999999999, marks=pytest.mark.skip_ubsan),
49+
],
50+
)
4451
def test_very_negative_exponent(all_parsers_all_precisions, neg_exp):
4552
# GH#38753
4653
parser, precision = all_parsers_all_precisions
@@ -51,6 +58,7 @@ def test_very_negative_exponent(all_parsers_all_precisions, neg_exp):
5158
tm.assert_frame_equal(result, expected)
5259

5360

61+
@pytest.mark.skip_ubsan
5462
@xfail_pyarrow # AssertionError: Attributes of DataFrame.iloc[:, 0] are different
5563
@pytest.mark.parametrize("exp", [999999999999999999, -999999999999999999])
5664
def test_too_many_exponent_digits(all_parsers_all_precisions, exp, request):

pandas/tests/scalar/timedelta/methods/test_round.py

+2
Original file line numberDiff line numberDiff line change
@@ -61,6 +61,7 @@ def test_round_invalid(self):
6161
with pytest.raises(ValueError, match=msg):
6262
t1.round(freq)
6363

64+
@pytest.mark.skip_ubsan
6465
def test_round_implementation_bounds(self):
6566
# See also: analogous test for Timestamp
6667
# GH#38964
@@ -86,6 +87,7 @@ def test_round_implementation_bounds(self):
8687
with pytest.raises(OutOfBoundsTimedelta, match=msg):
8788
Timedelta.max.round("s")
8889

90+
@pytest.mark.skip_ubsan
8991
@given(val=st.integers(min_value=iNaT + 1, max_value=lib.i8max))
9092
@pytest.mark.parametrize(
9193
"method", [Timedelta.round, Timedelta.floor, Timedelta.ceil]

pandas/tests/scalar/timedelta/test_arithmetic.py

+1
Original file line numberDiff line numberDiff line change
@@ -966,6 +966,7 @@ def test_td_op_timedelta_timedeltalike_array(self, op, arr):
966966

967967

968968
class TestTimedeltaComparison:
969+
@pytest.mark.skip_ubsan
969970
def test_compare_pytimedelta_bounds(self):
970971
# GH#49021 don't overflow on comparison with very large pytimedeltas
971972

pandas/tests/scalar/timedelta/test_timedelta.py

+1
Original file line numberDiff line numberDiff line change
@@ -551,6 +551,7 @@ def test_timedelta_hash_equality(self):
551551
ns_td = Timedelta(1, "ns")
552552
assert hash(ns_td) != hash(ns_td.to_pytimedelta())
553553

554+
@pytest.mark.skip_ubsan
554555
@pytest.mark.xfail(
555556
reason="pd.Timedelta violates the Python hash invariant (GH#44504).",
556557
)

pandas/tests/scalar/timestamp/methods/test_tz_localize.py

+1
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,7 @@
2525

2626

2727
class TestTimestampTZLocalize:
28+
@pytest.mark.skip_ubsan
2829
def test_tz_localize_pushes_out_of_bounds(self):
2930
# GH#12677
3031
# tz_localize that pushes away from the boundary is OK

pandas/tests/scalar/timestamp/test_constructors.py

+1
Original file line numberDiff line numberDiff line change
@@ -822,6 +822,7 @@ def test_barely_out_of_bounds(self):
822822
with pytest.raises(OutOfBoundsDatetime, match=msg):
823823
Timestamp("2262-04-11 23:47:16.854775808")
824824

825+
@pytest.mark.skip_ubsan
825826
def test_bounds_with_different_units(self):
826827
out_of_bounds_dates = ("1677-09-21", "2262-04-12")
827828

pandas/tests/tools/test_to_datetime.py

+1
Original file line numberDiff line numberDiff line change
@@ -1140,6 +1140,7 @@ def test_to_datetime_dt64s_out_of_ns_bounds(self, cache, dt, errors):
11401140
assert ts.unit == "s"
11411141
assert ts.asm8 == dt
11421142

1143+
@pytest.mark.skip_ubsan
11431144
def test_to_datetime_dt64d_out_of_bounds(self, cache):
11441145
dt64 = np.datetime64(np.iinfo(np.int64).max, "D")
11451146

pyproject.toml

+1
Original file line numberDiff line numberDiff line change
@@ -523,6 +523,7 @@ markers = [
523523
"db: tests requiring a database (mysql or postgres)",
524524
"clipboard: mark a pd.read_clipboard test",
525525
"arm_slow: mark a test as slow for arm64 architecture",
526+
"skip_ubsan: Tests known to fail UBSAN check",
526527
]
527528

528529
[tool.mypy]

scripts/tests/data/deps_minimum.toml

+1
Original file line numberDiff line numberDiff line change
@@ -382,6 +382,7 @@ markers = [
382382
"db: tests requiring a database (mysql or postgres)",
383383
"clipboard: mark a pd.read_clipboard test",
384384
"arm_slow: mark a test as slow for arm64 architecture",
385+
"skip_ubsan: tests known to invoke undefined behavior",
385386
]
386387

387388
[tool.mypy]

0 commit comments

Comments
 (0)