Skip to content

BUG: segfault in HDFStore get_storer under coverage on python 3.11 #50105

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
2 of 3 tasks
graingert opened this issue Dec 7, 2022 · 4 comments
Closed
2 of 3 tasks

BUG: segfault in HDFStore get_storer under coverage on python 3.11 #50105

graingert opened this issue Dec 7, 2022 · 4 comments
Labels
Bug IO HDF5 read_hdf, HDFStore Segfault Non-Recoverable Error Upstream issue Issue related to pandas dependency

Comments

@graingert
Copy link
Contributor

graingert commented Dec 7, 2022

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import sys
import os
import tempfile

import pandas as pd


def main():
    items2 = [
        ({"x": {1.0: "a"}, "y": {1.0: 1}}, "/data00"),
        ({"x": {2.0: "b"}, "y": {2.0: 2}}, "/data01"),
        ({"x": {4.0: "d"}, "y": {4.0: 4}}, "/data03"),
        ({"x": {6.0: "f"}, "y": {6.0: 6}}, "/data05"),
        ({"x": {5.0: "e"}, "y": {5.0: 5}}, "/data04"),
        ({"x": {11.0: "k"}, "y": {11.0: 11}}, "/data10"),
        ({"x": {7.0: "g"}, "y": {7.0: 7}}, "/data06"),
        ({"x": {8.0: "h"}, "y": {8.0: 8}}, "/data07"),
        ({"x": {9.0: "i"}, "y": {9.0: 9}}, "/data08"),
    ]

    with tempfile.TemporaryDirectory() as tmp_path:
        fn = os.path.join(tmp_path, "demo.h5")
        for df, key in items2:
            pd.DataFrame(df).to_hdf(fn, key, format="table", mode="a", append=False)
        with pd.HDFStore(fn, mode="r") as hdf:
            for k in hdf.keys():
                hdf.get_storer(k)


if __name__ == "__main__":
    sys.exit(main())

Issue Description

run with PYTHONFAULTHANDLER=True coverage run demo.py results in

Fatal Python error: Segmentation fault

Current thread 0x00007f3bbcc37740 (most recent call first):
  File "/home/graingert/anaconda3/envs/pytables-segfault/lib/python3.11/site-packages/tables/file.py", line 387 in cache_node
  File "/home/graingert/anaconda3/envs/pytables-segfault/lib/python3.11/site-packages/tables/node.py", line 372 in _g_set_location
  File "/home/graingert/anaconda3/envs/pytables-segfault/lib/python3.11/site-packages/tables/node.py", line 241 in __init__
  File "/home/graingert/anaconda3/envs/pytables-segfault/lib/python3.11/site-packages/tables/leaf.py", line 259 in __init__
  File "/home/graingert/anaconda3/envs/pytables-segfault/lib/python3.11/site-packages/tables/carray.py", line 200 in __init__
  File "/home/graingert/anaconda3/envs/pytables-segfault/lib/python3.11/site-packages/tables/group.py", line 1158 in _g_load_child
  File "/home/graingert/anaconda3/envs/pytables-segfault/lib/python3.11/site-packages/tables/file.py", line 417 in get_node
  File "/home/graingert/anaconda3/envs/pytables-segfault/lib/python3.11/site-packages/tables/file.py", line 1556 in _get_node
  File "/home/graingert/anaconda3/envs/pytables-segfault/lib/python3.11/site-packages/tables/group.py", line 685 in _f_get_child
  File "/home/graingert/anaconda3/envs/pytables-segfault/lib/python3.11/site-packages/tables/group.py", line 798 in __getattr__
  File "/home/graingert/anaconda3/envs/pytables-segfault/lib/python3.11/site-packages/tables/index.py", line 416 in _g_post_init_hook
  File "/home/graingert/anaconda3/envs/pytables-segfault/lib/python3.11/site-packages/tables/node.py", line 258 in __init__
  File "/home/graingert/anaconda3/envs/pytables-segfault/lib/python3.11/site-packages/tables/group.py", line 221 in __init__
  File "/home/graingert/anaconda3/envs/pytables-segfault/lib/python3.11/site-packages/tables/index.py", line 381 in __init__
  File "/home/graingert/anaconda3/envs/pytables-segfault/lib/python3.11/site-packages/tables/group.py", line 1151 in _g_load_child
  File "/home/graingert/anaconda3/envs/pytables-segfault/lib/python3.11/site-packages/tables/file.py", line 417 in get_node
  File "/home/graingert/anaconda3/envs/pytables-segfault/lib/python3.11/site-packages/tables/file.py", line 1556 in _get_node
  File "/home/graingert/anaconda3/envs/pytables-segfault/lib/python3.11/site-packages/tables/file.py", line 1607 in get_node
  File "/home/graingert/anaconda3/envs/pytables-segfault/lib/python3.11/site-packages/tables/file.py", line 2001 in __contains__
  File "/home/graingert/anaconda3/envs/pytables-segfault/lib/python3.11/site-packages/tables/table.py", line 844 in _g_post_init_hook
  File "/home/graingert/anaconda3/envs/pytables-segfault/lib/python3.11/site-packages/tables/node.py", line 258 in __init__
  File "/home/graingert/anaconda3/envs/pytables-segfault/lib/python3.11/site-packages/tables/leaf.py", line 259 in __init__
  File "/home/graingert/anaconda3/envs/pytables-segfault/lib/python3.11/site-packages/tables/table.py", line 807 in __init__
  File "/home/graingert/anaconda3/envs/pytables-segfault/lib/python3.11/site-packages/tables/group.py", line 1158 in _g_load_child
  File "/home/graingert/anaconda3/envs/pytables-segfault/lib/python3.11/site-packages/tables/file.py", line 417 in get_node
  File "/home/graingert/anaconda3/envs/pytables-segfault/lib/python3.11/site-packages/tables/file.py", line 1556 in _get_node
  File "/home/graingert/anaconda3/envs/pytables-segfault/lib/python3.11/site-packages/tables/group.py", line 685 in _f_get_child
  File "/home/graingert/anaconda3/envs/pytables-segfault/lib/python3.11/site-packages/tables/group.py", line 798 in __getattr__
  File "/home/graingert/anaconda3/envs/pytables-segfault/lib/python3.11/site-packages/pandas/io/pytables.py", line 3420 in storable
  File "/home/graingert/anaconda3/envs/pytables-segfault/lib/python3.11/site-packages/pandas/io/pytables.py", line 2740 in infer_axes
  File "/home/graingert/anaconda3/envs/pytables-segfault/lib/python3.11/site-packages/pandas/io/pytables.py", line 1521 in get_storer
  File "/home/graingert/projects/dask/demo.py", line 65 in main
  File "/home/graingert/projects/dask/demo.py", line 69 in <module>
  File "/home/graingert/anaconda3/envs/pytables-segfault/lib/python3.11/site-packages/coverage/execfile.py", line 199 in run
  File "/home/graingert/anaconda3/envs/pytables-segfault/lib/python3.11/site-packages/coverage/cmdline.py", line 830 in do_run
  File "/home/graingert/anaconda3/envs/pytables-segfault/lib/python3.11/site-packages/coverage/cmdline.py", line 659 in command_line
  File "/home/graingert/anaconda3/envs/pytables-segfault/lib/python3.11/site-packages/coverage/cmdline.py", line 943 in main
  File "/home/graingert/anaconda3/envs/pytables-segfault/bin/coverage", line 11 in <module>

Extension modules: numpy.core._multiarray_umath, numpy.core._multiarray_tests, numpy.linalg._umath_linalg, numpy.fft._pocketfft_internal, numpy.random._common, numpy.random.bit_generator, numpy.random._bounded_integers, numpy.random._mt19937, numpy.random.mtrand, numpy.random._philox, numpy.random._pcg64, numpy.random._sfc64, numpy.random._generator, pandas._libs.tslibs.np_datetime, pandas._libs.tslibs.dtypes, pandas._libs.tslibs.base, pandas._libs.tslibs.nattype, pandas._libs.tslibs.timezones, pandas._libs.tslibs.tzconversion, pandas._libs.tslibs.ccalendar, pandas._libs.tslibs.fields, pandas._libs.tslibs.timedeltas, pandas._libs.tslibs.timestamps, pandas._libs.properties, pandas._libs.tslibs.offsets, pandas._libs.tslibs.parsing, pandas._libs.tslibs.conversion, pandas._libs.tslibs.period, pandas._libs.tslibs.vectorized, pandas._libs.ops_dispatch, pandas._libs.missing, pandas._libs.hashtable, pandas._libs.algos, pandas._libs.interval, pandas._libs.tslib, pandas._libs.lib, pandas._libs.hashing, pandas._libs.ops, numexpr.interpreter, pandas._libs.arrays, pandas._libs.index, pandas._libs.join, pandas._libs.sparse, pandas._libs.reduction, pandas._libs.indexing, pandas._libs.internals, pandas._libs.writers, pandas._libs.window.aggregations, pandas._libs.window.indexers, pandas._libs.reshape, pandas._libs.tslibs.strptime, pandas._libs.groupby, pandas._libs.testing, pandas._libs.parsers, pandas._libs.json, tables._comp_lzo, tables._comp_bzip2, tables.utilsextension, tables.hdf5extension, tables.linkextension, tables.lrucacheextension, tables.tableextension, tables.indexesextension (total: 63)
[1]    807845 segmentation fault (core dumped)  PYTHONFAULTHANDLER=True coverage run demo.py

Expected Behavior

no segfault

Installed Versions

INSTALLED VERSIONS

commit : 8dab54d
python : 3.11.0.final.0
python-bits : 64
OS : Linux
OS-release : 5.15.0-53-generic
Version : #59-Ubuntu SMP Mon Oct 17 18:53:30 UTC 2022
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_GB.UTF-8
LOCALE : en_GB.UTF-8

pandas : 1.5.2
numpy : 1.23.5
pytz : 2022.6
dateutil : 2.8.2
setuptools : 65.5.1
pip : 22.3.1
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : None
IPython : None
pandas_datareader: None
bs4 : None
bottleneck : None
brotli : None
fastparquet : None
fsspec : None
gcsfs : None
matplotlib : None
numba : None
numexpr : 2.8.3
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pyreadstat : None
pyxlsb : None
s3fs : None
scipy : None
snappy : None
sqlalchemy : None
tables : 3.7.0
tabulate : None
xarray : None
xlrd : None
xlwt : None
zstandard : None
tzdata : None

@graingert graingert added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Dec 7, 2022
@graingert graingert changed the title BUG: BUG: segfault in HDFStore get_storer Dec 7, 2022
@graingert
Copy link
Contributor Author

graingert commented Dec 7, 2022

looks like a problem with tables.lrucacheextension.NodeCache in https://github.com/PyTables/PyTables/blob/cb61accee3189d710d508c8debd800521320233d/tables/file.py#L376-L387

@graingert graingert changed the title BUG: segfault in HDFStore get_storer BUG: segfault in HDFStore get_storer under coverage Dec 7, 2022
@graingert graingert changed the title BUG: segfault in HDFStore get_storer under coverage BUG: segfault in HDFStore get_storer under coverage on python 3.11 Dec 7, 2022
graingert added a commit to graingert/pandas that referenced this issue Dec 7, 2022
@lithomas1 lithomas1 added IO HDF5 read_hdf, HDFStore Upstream issue Issue related to pandas dependency and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Dec 7, 2022
@lithomas1
Copy link
Member

I can reproduce this, and can confirm this segfaults in the pytables C extensions. This also looks to be a bad interaction with coveragepy. It does not segfault when you don't use coverage.

@lithomas1 lithomas1 added the Segfault Non-Recoverable Error label Dec 7, 2022
@graingert
Copy link
Contributor Author

reproducer with just pytables: PyTables/PyTables#977

@phofl
Copy link
Member

phofl commented Dec 9, 2022

Closing here, since this is not on our side, please ping to reopen if I misunderstood.

@phofl phofl closed this as completed Dec 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug IO HDF5 read_hdf, HDFStore Segfault Non-Recoverable Error Upstream issue Issue related to pandas dependency
Projects
None yet
Development

No branches or pull requests

3 participants