bad arguments to dataframe constructor causes segfault #26429

kurtbrose · 2019-05-16T18:43:08Z

Code Sample, a copy-pastable example if possible

pandas.DataFrame(data=(range(10), range(10, 20)), columns=('ones', 'twos'))

# python
Python 2.7.14 (default, Dec 12 2017, 16:55:09)
[GCC 4.9.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import pandas
>>> pandas.DataFrame(data=(range(10), range(10, 20)), columns=('ones', 'twos'))
Segmentation fault

Problem description

Segmentation fault on bad inputs to DataFrame constructor.

Expected Output

Raise an exception:

AssertionError: 2 columns passed, passed data had 10 columns

Output of `pd.show_versions()`

[paste the output of pd.show_versions() here below this line]

INSTALLED VERSIONS

commit: None
python: 2.7.14.final.0
python-bits: 64
OS: Linux
OS-release: 4.9.125-linuxkit
machine: x86_64
processor:
byteorder: little
LC_ALL: None
LANG: C.UTF-8
LOCALE: None.None

pandas: 0.24.2
pytest: 3.8.1
pip: 9.0.1
setuptools: 38.2.4
Cython: None
numpy: 1.16.3
scipy: None
pyarrow: None
xarray: None
IPython: 5.8.0
sphinx: None
patsy: None
dateutil: 2.6.0
pytz: 2017.2
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: None
openpyxl: 2.2.6
xlrd: None
xlwt: 1.3.0
xlsxwriter: None
lxml.etree: None
bs4: None
html5lib: 1.0.1
sqlalchemy: 1.1.11
pymysql: None
psycopg2: 2.7.3.2 (dt dec pq3 ext lo64)
jinja2: 2.9.6
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
gcsfs: None

The text was updated successfully, but these errors were encountered:

TomAugspurger · 2019-05-16T18:44:39Z

Can you try on master? I think this was fixed recently.

…

On Thu, May 16, 2019 at 1:43 PM Kurt Rose ***@***.***> wrote: Code Sample, a copy-pastable example if possible pandas.DataFrame(data=(range(10), range(10, 20)), columns=('ones', 'twos')) # python Python 2.7.14 (default, Dec 12 2017, 16:55:09) [GCC 4.9.2] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import pandas >>> pandas.DataFrame(data=(range(10), range(10, 20)), columns=('ones', 'twos')) Segmentation fault Problem description Segmentation fault on bad inputs to DataFrame constructor. Expected Output Raise an exception: AssertionError: 2 columns passed, passed data had 10 columns Output of pd.show_versions() [paste the output of pd.show_versions() here below this line] INSTALLED VERSIONS commit: None python: 2.7.14.final.0 python-bits: 64 OS: Linux OS-release: 4.9.125-linuxkit machine: x86_64 processor: byteorder: little LC_ALL: None LANG: C.UTF-8 LOCALE: None.None pandas: 0.24.2 pytest: 3.8.1 pip: 9.0.1 setuptools: 38.2.4 Cython: None numpy: 1.16.3 scipy: None pyarrow: None xarray: None IPython: 5.8.0 sphinx: None patsy: None dateutil: 2.6.0 pytz: 2017.2 blosc: None bottleneck: None tables: None numexpr: None feather: None matplotlib: None openpyxl: 2.2.6 xlrd: None xlwt: 1.3.0 xlsxwriter: None lxml.etree: None bs4: None html5lib: 1.0.1 sqlalchemy: 1.1.11 pymysql: None psycopg2: 2.7.3.2 (dt dec pq3 ext lo64) jinja2: 2.9.6 s3fs: None fastparquet: None pandas_gbq: None pandas_datareader: None gcsfs: None — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#26429?email_source=notifications&email_token=AAKAOIS26CLHS42RWSEXMULPVWTMJA5CNFSM4HNPGCM2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4GUHYDFA>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAKAOITUHQ4EK2TUUQMNWETPVWTMJANCNFSM4HNPGCMQ> .

kurtbrose · 2019-05-16T22:50:20Z

oh, great! this isn't really a blocker for me; is there a doc or anything on how to install from source? (does pip install)

TomAugspurger · 2019-05-16T23:20:11Z

The contributing docs have all that info. Pip may work, but you need a compiler.

…

On May 16, 2019, at 17:50, Kurt Rose ***@***.***> wrote: oh, great! this isn't really a blocker for me; is there a doc or anything on how to install from source? (does pip install) — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

simonjayhawkins · 2019-05-16T23:46:24Z

I think this was fixed recently.

xref #25691

Can you try on master?

Python 3.7.3 (default, Apr 24 2019, 15:29:51) [MSC v.1915 64 bit (AMD64)] :: Anaconda, Inc. on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import pandas
>>> pandas.__version__
'0.25.0.dev0+567.g1263e1a9c'
>>> pandas.DataFrame(data=(range(10), range(10, 20)), columns=('ones', 'twos'))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\simon\OneDrive\code\pandas-simonjayhawkins\pandas\core\frame.py", line 434, in __init__
    arrays, columns = to_arrays(data, columns, dtype=dtype)
  File "C:\Users\simon\OneDrive\code\pandas-simonjayhawkins\pandas\core\internals\construction.py", line 418, in to_arrays
    dtype=dtype)
  File "C:\Users\simon\OneDrive\code\pandas-simonjayhawkins\pandas\core\internals\construction.py", line 428, in _list_to_arrays
    coerce_float=coerce_float)
  File "C:\Users\simon\OneDrive\code\pandas-simonjayhawkins\pandas\core\internals\construction.py", line 484, in _convert_object_arr
ay
    con=len(content)))
AssertionError: 2 columns passed, passed data had 10 columns
>>>

doesn't segfault, but should AssertionError be raised?

jreback · 2019-05-16T23:54:58Z

hmm we do check construction errors; iirc they should be ValueError

so yes this should change

simonjayhawkins mentioned this issue May 17, 2019

ERR: User-facing AssertionError in DataFrame Constructor #26440

Merged

4 tasks

gfyoung added DataFrame DataFrame data structure Segfault Non-Recoverable Error Testing pandas testing functions or related to the test suite and removed Segfault Non-Recoverable Error labels May 18, 2019

jreback added this to the 0.25.0 milestone May 18, 2019

jreback closed this as completed in #26440 May 18, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bad arguments to dataframe constructor causes segfault #26429

bad arguments to dataframe constructor causes segfault #26429

kurtbrose commented May 16, 2019

INSTALLED VERSIONS

TomAugspurger commented May 16, 2019 via email

kurtbrose commented May 16, 2019

TomAugspurger commented May 16, 2019 via email

simonjayhawkins commented May 16, 2019

jreback commented May 16, 2019

bad arguments to dataframe constructor causes segfault #26429

bad arguments to dataframe constructor causes segfault #26429

Comments

kurtbrose commented May 16, 2019

Code Sample, a copy-pastable example if possible

Problem description

Expected Output

Output of pd.show_versions()

INSTALLED VERSIONS

TomAugspurger commented May 16, 2019 via email

kurtbrose commented May 16, 2019

TomAugspurger commented May 16, 2019 via email

simonjayhawkins commented May 16, 2019

jreback commented May 16, 2019

Output of `pd.show_versions()`