Skip to content

BUG: SIGBUS in test_float_byteswap test on arm #54391

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
3 tasks done
thesamesam opened this issue Aug 3, 2023 · 3 comments · Fixed by #54407
Closed
3 tasks done

BUG: SIGBUS in test_float_byteswap test on arm #54391

thesamesam opened this issue Aug 3, 2023 · 3 comments · Fixed by #54407
Labels
ARM aarch64 architecture Bug IO SAS SAS: read_sas Segfault Non-Recoverable Error

Comments

@thesamesam
Copy link
Contributor

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

Run the pandas-2.0.3 testsuite on armv7.

Issue Description

The pandas/tests/io/sas/test_byteswap.py::test_float_byteswap tests SIGBUS for me on arm (arm64 host, armv7 chroot).

When building with UBSAN (-fsanitize=undefined), I get the following:

pandas/tests/io/sas/test_byteswap.py::test_float_byteswap[False-float32] pandas/io/sas/byteswap.c:2293:15: runtime error: load of misaligned address 0xcc9025db for type 'float', which requires 4 byte alignment
0xcc9025db: note: pointer points here
 26  45 c1 18 00 00 00 00 70  a3 aa 09 40 00 00 00 00  38 21 90 cc 80 d5 90 cc  01 00 00 00 4c 5a c8
              ^

[gw8] PASSED pandas/tests/io/sas/test_byteswap.py::test_float_byteswap[False-float32]
pandas/tests/io/sas/test_byteswap.py::test_float_byteswap[False-float64] pandas/io/sas/byteswap.c:2524:15: runtime error: load of misaligned address 0xbecce23c for type 'double', which requires 8 byte alignment
0xbecce23c: note: pointer points here
  1a e5 23 62 00 00 00 00  00 00 00 00 ec b4 43 2a  c4 10 09 93 00 00 00 00  00 e2 cc be 00 00 00 00
              ^
Fatal Python error: Bus error

Thread 0xf6f5b420 (most recent call first):
  File "/usr/lib/python3.11/site-packages/execnet/gateway_base.py", line 474 in read
  File "/usr/lib/python3.11/site-packages/execnet/gateway_base.py", line 507 in from_io
  File "/usr/lib/python3.11/site-packages/execnet/gateway_base.py", line 1049 in _thread_receiver
  File "/usr/lib/python3.11/site-packages/execnet/gateway_base.py", line 296 in run
  File "/usr/lib/python3.11/site-packages/execnet/gateway_base.py", line 361 in _perform_spawn

Current thread 0xf7d60020 (most recent call first):
  File "/var/tmp/portage/dev-python/pandas-2.0.3/work/pandas-2.0.3-python3_11/install/usr/lib/python3.11/site-packages/pandas/tests/io/sas/test_byteswap.py", line 51 in _test
  File "/var/tmp/portage/dev-python/pandas-2.0.3/work/pandas-2.0.3-python3_11/install/usr/lib/python3.11/site-packages/pandas/tests/io/sas/test_byteswap.py", line 37 in test_float_byteswap
  File "/usr/lib/python3.11/site-packages/hypothesis/core.py", line 785 in run
  File "/usr/lib/python3.11/site-packages/hypothesis/executors.py", line 47 in default_new_style_executor
  File "/usr/lib/python3.11/site-packages/hypothesis/core.py", line 789 in execute_once
  File "/usr/lib/python3.11/site-packages/hypothesis/core.py", line 850 in _execute_once_for_engine
  File "/usr/lib/python3.11/site-packages/hypothesis/internal/conjecture/engine.py", line 173 in __stoppable_test_function
  File "/usr/lib/python3.11/site-packages/hypothesis/internal/conjecture/engine.py", line 195 in test_function
  File "/usr/lib/python3.11/site-packages/hypothesis/internal/conjecture/engine.py", line 1051 in cached_test_function
  File "/usr/lib/python3.11/site-packages/hypothesis/internal/conjecture/engine.py", line 670 in generate_new_examples
  File "/usr/lib/python3.11/site-packages/hypothesis/internal/conjecture/engine.py", line 866 in _run
  File "/usr/lib/python3.11/site-packages/hypothesis/internal/conjecture/engine.py", line 460 in run
  File "/usr/lib/python3.11/site-packages/hypothesis/core.py", line 927 in run_engine
  File "/usr/lib/python3.11/site-packages/hypothesis/core.py", line 1341 in wrapped_test
  File "/var/tmp/portage/dev-python/pandas-2.0.3/work/pandas-2.0.3-python3_11/install/usr/lib/python3.11/site-packages/pandas/tests/io/sas/test_byteswap.py", line 33 in test_float_byteswap
  File "/usr/lib/python3.11/site-packages/_pytest/python.py", line 194 in pytest_pyfunc_call
  File "/usr/lib/python3.11/site-packages/pluggy/_callers.py", line 80 in _multicall
  File "/usr/lib/python3.11/site-packages/pluggy/_manager.py", line 112 in _hookexec
  File "/usr/lib/python3.11/site-packages/pluggy/_hooks.py", line 433 in __call__
  File "/usr/lib/python3.11/site-packages/_pytest/python.py", line 1788 in runtest
  File "/usr/lib/python3.11/site-packages/_pytest/runner.py", line 169 in pytest_runtest_call
  File "/usr/lib/python3.11/site-packages/pluggy/_callers.py", line 80 in _multicall
  File "/usr/lib/python3.11/site-packages/pluggy/_manager.py", line 112 in _hookexec
  File "/usr/lib/python3.11/site-packages/pluggy/_hooks.py", line 433 in __call__
  File "/usr/lib/python3.11/site-packages/_pytest/runner.py", line 262 in <lambda>
  File "/usr/lib/python3.11/site-packages/_pytest/runner.py", line 341 in from_call
  File "/usr/lib/python3.11/site-packages/_pytest/runner.py", line 261 in call_runtest_hook
  File "/usr/lib/python3.11/site-packages/_pytest/runner.py", line 222 in call_and_report
  File "/usr/lib/python3.11/site-packages/_pytest/runner.py", line 133 in runtestprotocol
  File "/usr/lib/python3.11/site-packages/_pytest/runner.py", line 114 in pytest_runtest_protocol
  File "/usr/lib/python3.11/site-packages/pluggy/_callers.py", line 80 in _multicall
  File "/usr/lib/python3.11/site-packages/pluggy/_manager.py", line 112 in _hookexec
  File "/usr/lib/python3.11/site-packages/pluggy/_hooks.py", line 433 in __call__
  File "/usr/lib/python3.11/site-packages/xdist/remote.py", line 174 in run_one_test
  File "/usr/lib/python3.11/site-packages/xdist/remote.py", line 157 in pytest_runtestloop
  File "/usr/lib/python3.11/site-packages/pluggy/_callers.py", line 80 in _multicall
  File "/usr/lib/python3.11/site-packages/pluggy/_manager.py", line 112 in _hookexec
  File "/usr/lib/python3.11/site-packages/pluggy/_hooks.py", line 433 in __call__
  File "/usr/lib/python3.11/site-packages/_pytest/main.py", line 324 in _main
  File "/usr/lib/python3.11/site-packages/_pytest/main.py", line 270 in wrap_session
  File "/usr/lib/python3.11/site-packages/_pytest/main.py", line 317 in pytest_cmdline_main
  File "/usr/lib/python3.11/site-packages/pluggy/_callers.py", line 80 in _multicall
  File "/usr/lib/python3.11/site-packages/pluggy/_manager.py", line 112 in _hookexec
  File "/usr/lib/python3.11/site-packages/pluggy/_hooks.py", line 433 in __call__
  File "/usr/lib/python3.11/site-packages/xdist/remote.py", line 355 in <module>
  File "/usr/lib/python3.11/site-packages/execnet/gateway_base.py", line 1157 in executetask
  File "/usr/lib/python3.11/site-packages/execnet/gateway_base.py", line 296 in run
  File "/usr/lib/python3.11/site-packages/execnet/gateway_base.py", line 361 in _perform_spawn
  File "/usr/lib/python3.11/site-packages/execnet/gateway_base.py", line 343 in integrate_as_primary_thread
  File "/usr/lib/python3.11/site-packages/execnet/gateway_base.py", line 1142 in serve
  File "/usr/lib/python3.11/site-packages/execnet/gateway_base.py", line 1640 in serve
  File "<string>", line 8 in <module>
  File "<string>", line 1 in <module>

Extension modules: numpy.core._multiarray_umath, numpy.core._multiarray_tests, numpy.linalg._umath_linalg, numpy.fft._pocketfft_internal, numpy.random._common, numpy.random.bit_generator, numpy.random._bounded_integers, numpy.random._mt19937, numpy.random.mtrand, numpy.random._philox, numpy.random._pcg64, numpy.random._sfc64, numpy.random._generator, pandas._libs.tslibs.np_datetime, pandas._libs.tslibs.dtypes, pandas._libs.tslibs.base, pandas._libs.tslibs.nattype, pandas._libs.tslibs.timezones, pandas._libs.tslibs.ccalendar, pandas._libs.tslibs.fields, pandas._libs.tslibs.timedeltas, pandas._libs.tslibs.tzconversion, pandas._libs.tslibs.timestamps, pandas._libs.properties, pandas._libs.tslibs.offsets, pandas._libs.tslibs.strptime, pandas._libs.tslibs.parsing, pandas._libs.tslibs.conversion, pandas._libs.tslibs.period, pandas._libs.tslibs.vectorized, pandas._libs.ops_dispatch, pandas._libs.missing, pandas._libs.hashtable, pandas._libs.algos, pandas._libs.interval, pandas._libs.lib, pandas._libs.hashing, pandas._libs.tslib, pandas._libs.ops, numexpr.interpreter, bottleneck.move, bottleneck.nonreduce, bottleneck.nonreduce_axis, bottleneck.reduce, pandas._libs.arrays, pandas._libs.sparse, pandas._libs.reduction, pandas._libs.indexing, pandas._libs.index, pandas._libs.internals, pandas._libs.join, pandas._libs.writers, pandas._libs.window.aggregations, pandas._libs.window.indexers, pandas._libs.reshape, pandas._libs.groupby, pandas._libs.testing, pandas._libs.parsers, pandas._libs.json, matplotlib._c_internal_utils, PIL._imaging, matplotlib._path, kiwisolver._cext, scipy._lib._ccallback_c, numpy.linalg.lapack_lite, scipy.sparse._sparsetools, _csparsetools, scipy.sparse._csparsetools, scipy.sparse.linalg._isolve._iterative, scipy.linalg._fblas, scipy.linalg._flapack, scipy.linalg.cython_lapack, scipy.linalg._cythonized_array_utils, scipy.linalg._solve_toeplitz, scipy.linalg._decomp_lu_cython, scipy.linalg._matfuncs_sqrtm_triu, scipy.linalg.cython_blas, scipy.linalg._matfuncs_expm, scipy.linalg._decomp_update, scipy.linalg._flinalg, scipy.sparse.linalg._dsolve._superlu, scipy.sparse.linalg._eigen.arpack._arpack, scipy.sparse.csgraph._tools, scipy.sparse.csgraph._shortest_path, scipy.sparse.csgraph._traversal, scipy.sparse.csgraph._min_spanning_tree, scipy.sparse.csgraph._flow, scipy.sparse.csgraph._matching, scipy.sparse.csgraph._reordering, scipy.spatial._ckdtree, scipy._lib.messagestream, scipy.spatial._qhull, scipy.spatial._voronoi, scipy.spatial._distance_wrap, scipy.spatial._hausdorff, scipy.special._ufuncs_cxx, scipy.special._ufuncs, scipy.special._specfun, scipy.special._comb, scipy.special._ellip_harm_2, scipy.spatial.transform._rotation, scipy.ndimage._nd_image, _ni_label, scipy.ndimage._ni_label, scipy.optimize._minpack2, scipy.optimize._group_columns, scipy.optimize._trlib._trlib, scipy.optimize._lbfgsb, _moduleTNC, scipy.optimize._moduleTNC, scipy.optimize._cobyla, scipy.optimize._slsqp, scipy.optimize._minpack, scipy.optimize._lsq.givens_elimination, scipy.optimize._zeros, scipy.optimize.__nnls, scipy.optimize._highs.cython.src._highs_wrapper, scipy.optimize._highs._highs_wrapper, scipy.optimize._highs.cython.src._highs_constants, scipy.optimize._highs._highs_constants, scipy.linalg._interpolative, scipy.optimize._bglu_dense, scipy.optimize._lsap, scipy.optimize._direct, scipy.integrate._odepack, scipy.integrate._quadpack, scipy.integrate._vode, scipy.integrate._dop, scipy.integrate._lsoda, scipy.special.cython_special, scipy.stats._stats, scipy.stats.beta_ufunc, scipy.stats._boost.beta_ufunc, scipy.stats.binom_ufunc, scipy.stats._boost.binom_ufunc, scipy.stats.nbinom_ufunc, scipy.stats._boost.nbinom_ufunc, scipy.stats.hypergeom_ufunc, scipy.stats._boost.hypergeom_ufunc, scipy.stats.ncf_ufunc, scipy.stats._boost.ncf_ufunc, scipy.stats.ncx2_ufunc, scipy.stats._boost.ncx2_ufunc, scipy.stats.nct_ufunc, scipy.stats._boost.nct_ufunc, scipy.stats.skewnorm_ufunc, scipy.stats._boost.skewnorm_ufunc, scipy.stats.invgauss_ufunc, scipy.stats._boost.invgauss_ufunc, scipy.interpolate._fitpack, scipy.interpolate.dfitpack, scipy.interpolate._bspl, scipy.interpolate._ppoly, scipy.interpolate.interpnd, scipy.interpolate._rbfinterp_pythran, scipy.interpolate._rgi_cython, scipy.stats._biasedurn, scipy.stats._levy_stable.levyst, scipy.stats._stats_pythran, scipy._lib._uarray._uarray, scipy.stats._statlib, scipy.stats._sobol, scipy.stats._qmc_cy, scipy.stats._mvn, scipy.stats._rcont.rcont, scipy.signal._sigtools, scipy.signal._max_len_seq_inner, scipy.signal._upfirdn_apply, scipy.signal._spline, scipy.signal._sosfilt, scipy.signal._spectral, scipy.signal._peak_finding_utils, markupsafe._speedups, lxml._elementpath, lxml.etree, sqlalchemy.cyextension.collections, sqlalchemy.cyextension.immutabledict, sqlalchemy.cyextension.processors, sqlalchemy.cyextension.resultproxy, sqlalchemy.cyextension.util, matplotlib._image, tables._comp_lzo, tables._comp_bzip2, tables.utilsextension, tables.hdf5extension, tables.linkextension, tables.lrucacheextension, tables.tableextension, tables.indexesextension, pandas.io.sas._byteswap, psycopg2._psycopg (total: 191)

Both these tests and this functionality were introduced in c855be8 (cc @jonashaag).

Expected Behavior

All tests pass.

Installed Versions

# python3 -c 'import pandas as pd; pd.show_versions()' /usr/lib/python3.11/site-packages/_distutils_hack/__init__.py:33: UserWarning: Setuptools is replacing distutils. warnings.warn("Setuptools is replacing distutils.")

INSTALLED VERSIONS

commit : 0f43794
python : 3.11.4.final.0
python-bits : 32
OS : Linux
OS-release : 5.15.117-gentoo-dist
Version : #1 SMP Wed Jun 14 13:14:49 -00 2023
machine : armv8l
processor : ARMv8 Processor rev 1 (v8l)
byteorder : little
LC_ALL : None
LANG : C.UTF8
LOCALE : en_US.UTF-8

pandas : 2.0.3
numpy : 1.25.2
pytz : 2023.3
dateutil : 2.8.2
setuptools : 68.0.0
pip : None
Cython : 3.0.0
pytest : 7.4.0
hypothesis : 6.82.0
sphinx : None
blosc : 1.11.1
feather : None
xlsxwriter : 3.1.2
lxml.etree : 4.9.3
html5lib : 1.1
pymysql : 1.4.6
psycopg2 : 2.9.4
jinja2 : 3.1.2
IPython : None
pandas_datareader: None
bs4 : 4.12.2
bottleneck : 1.3.7
brotli : None
fastparquet : None
fsspec : None
gcsfs : None
matplotlib : 3.7.2
numba : None
numexpr : 2.8.4
odfpy : None
openpyxl : 3.1.2
pandas_gbq : None
pyarrow : None
pyreadstat : None
pyxlsb : None
s3fs : None
scipy : 1.11.1
snappy : None
sqlalchemy : 2.0.19
tables : 3.8.0
tabulate : 0.9.0
xarray : 2023.7.0
xlrd : 2.0.1
zstandard : None
tzdata : None
qtpy : None
pyqt5 : None

@thesamesam thesamesam added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Aug 3, 2023
@thesamesam
Copy link
Contributor Author

thesamesam commented Aug 3, 2023

There's definitely unaligned access there (using the offsets), but the byteswap functions look like possible aliasing violations too, e.g. c855be8#diff-e64873a81f0bfbd0c2b519bed04f66272560b25e1d78f65a4f685230364a2b0dR84.

You can't cast between e.g. uint32_t* and float* like that without violating strict aliasing. Please use memcpy or the portable (and efficient) routines described at https://github.com/projg2/portable-endianness. The compiler recognises these patterns and optimises appropriately.

@jonashaag
Copy link
Contributor

Thanks for pointing this out! I didn't know about this aliasing rule. I will fix this.

For the unaligned accesses, is there any way to avoid them? The SAS7BDAT file format effectively requires unaligned reads, eg. when reading columns of non-power-of-2 length

@jonashaag
Copy link
Contributor

I looked into portable-endianness but it doesn't seem to compile to efficient code with MSVC.

jonashaag added a commit to jonashaag/pandas that referenced this issue Aug 4, 2023
jonashaag added a commit to jonashaag/pandas that referenced this issue Aug 4, 2023
jonashaag added a commit to jonashaag/pandas that referenced this issue Aug 4, 2023
@lithomas1 lithomas1 added IO SAS SAS: read_sas Segfault Non-Recoverable Error ARM aarch64 architecture and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Aug 4, 2023
mroeschke pushed a commit to mroeschke/pandas that referenced this issue Aug 18, 2023
thesamesam added a commit to thesamesam/c-blosc2 that referenced this issue Aug 21, 2023
Unaligned access is UB, even on x86.

On arm, I hit SIGBUSes in pandas's test suite via pandas->pytables->c-blosc2
because of unaligned loads and stores.

Modern compilers are capable of optimising the "slow" path bitshifts and memcpy
into faster alternatives where it is legal.

Bug: https://bugs.gentoo.org/911660
Bug: pandas-dev/pandas#54391
Bug: pandas-dev/pandas#54396
Signed-off-by: Sam James <[email protected]>
thesamesam added a commit to thesamesam/c-blosc2 that referenced this issue Aug 21, 2023
Unaligned access is UB, even on x86. UBsan will also detect this (-fsanitize=undefined
or -fsanitize=alignment).

On arm, I hit SIGBUSes in pandas's test suite via pandas->pytables->c-blosc2
because of unaligned loads and stores.

Modern compilers are capable of optimising the "slow" path bitshifts and memcpy
into faster alternatives where it is legal.

Bug: https://bugs.gentoo.org/911660
Bug: pandas-dev/pandas#54391
Bug: pandas-dev/pandas#54396
Signed-off-by: Sam James <[email protected]>
thesamesam added a commit to thesamesam/c-blosc2 that referenced this issue Aug 21, 2023
Unaligned access is UB, even on x86. UBsan will also detect this (-fsanitize=undefined
or -fsanitize=alignment).

On arm, I hit SIGBUSes in pandas's test suite via pandas->pytables->c-blosc2
because of unaligned loads and stores.

Modern compilers are capable of optimising the "slow" path bitshifts and memcpy
into faster alternatives where it is legal.

Bug: https://bugs.gentoo.org/911660
Bug: pandas-dev/pandas#54391
Bug: pandas-dev/pandas#54396
Signed-off-by: Sam James <[email protected]>
thesamesam added a commit to thesamesam/c-blosc2 that referenced this issue Aug 21, 2023
Unaligned access is UB, even on x86. UBsan will also detect this (-fsanitize=undefined
or -fsanitize=alignment).

On arm, I hit SIGBUSes in pandas's test suite via pandas->pytables->c-blosc2
because of unaligned loads and stores.

Modern compilers are capable of optimising the "slow" path bitshifts and memcpy
into faster alternatives where it is legal.

Bug: https://bugs.gentoo.org/911660
Bug: pandas-dev/pandas#54391
Bug: pandas-dev/pandas#54396
Signed-off-by: Sam James <[email protected]>
thesamesam added a commit to thesamesam/c-blosc2 that referenced this issue Aug 21, 2023
Unaligned access is UB, even on x86. UBsan will also detect this (-fsanitize=undefined
or -fsanitize=alignment).

On arm, I hit SIGBUSes in pandas's test suite via pandas->pytables->c-blosc2
because of unaligned loads and stores.

Modern compilers are capable of optimising the "slow" path bitshifts and memcpy
into faster alternatives where it is legal.

Bug: https://bugs.gentoo.org/911660
Bug: pandas-dev/pandas#54391
Bug: pandas-dev/pandas#54396
Signed-off-by: Sam James <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ARM aarch64 architecture Bug IO SAS SAS: read_sas Segfault Non-Recoverable Error
Projects
None yet
3 participants