Python crashes when executing memory_usage(deep=True) on a sparse series #19368

quale1 · 2018-01-24T05:59:39Z

Code Sample, a copy-pastable example if possible

import pandas as pd

s = pd.Series([None])
s.to_sparse().memory_usage(deep=True)

# crashes - Kernel died, restarting

Problem description

Executing the memory_usage(deep=True) method on a sparse series crashes Python. (With deep=False the method works as expected.)

Expected Output

Output of `pd.show_versions()`

INSTALLED VERSIONS

commit: None
python: 3.6.3.final.0
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 60 Stepping 3, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None

pandas: 0.22.0
pytest: 3.2.1
pip: 9.0.1
setuptools: 36.5.0.post20170921
Cython: 0.26.1
numpy: 1.13.3
scipy: 0.19.1
pyarrow: None
xarray: None
IPython: 6.1.0
sphinx: 1.6.3
patsy: 0.4.1
dateutil: 2.6.1
pytz: 2017.2
blosc: None
bottleneck: 1.2.1
tables: 3.4.2
numexpr: 2.6.2
feather: None
matplotlib: 2.1.0
openpyxl: 2.4.8
xlrd: 1.1.0
xlwt: 1.3.0
xlsxwriter: 1.0.2
lxml: 4.1.0
bs4: 4.6.0
html5lib: 0.999999999
sqlalchemy: 1.1.13
pymysql: None
psycopg2: None
jinja2: 2.9.6
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

The text was updated successfully, but these errors were encountered:

jreback · 2018-01-24T10:54:33Z

sparse is not fully test covered. a pull request to fix is welcome!

hexgnu · 2018-01-29T07:08:38Z

I hooked up gdb and tracked down the issue to inside of the lib.pyx which assumes that series's are of length > 0. I made a PR that should fix it, though we'll see how the tests chooch.

closes pandas-dev#19368 Author: Matthew Kirk <[email protected]> Closes pandas-dev#19438 from hexgnu/segfault_memory_usage and squashes the following commits: f9433d8 [Matthew Kirk] Use shared docstring and get rid of if condition 4ead141 [Matthew Kirk] Move whatsnew doc to Sparse ae9f74d [Matthew Kirk] Revert base.py cdd4141 [Matthew Kirk] Fix linting error 93a0c3d [Matthew Kirk] Merge remote-tracking branch 'upstream/master' into segfault_memory_usage 207bc74 [Matthew Kirk] Define memory_usage on SparseArray 21ae147 [Matthew Kirk] FIX: revert change to lib.pyx 3f52a44 [Matthew Kirk] Ah ha I think I got it 5e59e9c [Matthew Kirk] Use range over 0 <= for loops e251587 [Matthew Kirk] Fix failing test with indexing 27df317 [Matthew Kirk] Merge remote-tracking branch 'upstream/master' into segfault_memory_usage 7fdd03e [Matthew Kirk] Take out comment and use product 6bd6ddd [Matthew Kirk] BUG: don't assume series is length > 0

jreback added Bug Sparse Sparse Data Type Difficulty Intermediate labels Jan 24, 2018

jreback added this to the Next Major Release milestone Jan 24, 2018

hexgnu mentioned this issue Jan 29, 2018

BUG: don't assume series is length > 0 #19438

Closed

4 tasks

jreback modified the milestones: Next Major Release, 0.23.0 Feb 6, 2018

jreback closed this as completed in a01f74c Feb 6, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Python crashes when executing memory_usage(deep=True) on a sparse series #19368

Python crashes when executing memory_usage(deep=True) on a sparse series #19368

quale1 commented Jan 24, 2018

INSTALLED VERSIONS

jreback commented Jan 24, 2018

hexgnu commented Jan 29, 2018

Python crashes when executing memory_usage(deep=True) on a sparse series #19368

Python crashes when executing memory_usage(deep=True) on a sparse series #19368

Comments

quale1 commented Jan 24, 2018

Code Sample, a copy-pastable example if possible

Problem description

Expected Output

Output of pd.show_versions()

INSTALLED VERSIONS

jreback commented Jan 24, 2018

hexgnu commented Jan 29, 2018

Output of `pd.show_versions()`