Skip to content

Python crashes when executing memory_usage(deep=True) on a sparse series #19368

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
quale1 opened this issue Jan 24, 2018 · 2 comments
Closed
Labels
Bug Sparse Sparse Data Type
Milestone

Comments

@quale1
Copy link

quale1 commented Jan 24, 2018

Code Sample, a copy-pastable example if possible

import pandas as pd

s = pd.Series([None])
s.to_sparse().memory_usage(deep=True)

# crashes - Kernel died, restarting

Problem description

Executing the memory_usage(deep=True) method on a sparse series crashes Python. (With deep=False the method works as expected.)

Expected Output

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.6.3.final.0
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 60 Stepping 3, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None

pandas: 0.22.0
pytest: 3.2.1
pip: 9.0.1
setuptools: 36.5.0.post20170921
Cython: 0.26.1
numpy: 1.13.3
scipy: 0.19.1
pyarrow: None
xarray: None
IPython: 6.1.0
sphinx: 1.6.3
patsy: 0.4.1
dateutil: 2.6.1
pytz: 2017.2
blosc: None
bottleneck: 1.2.1
tables: 3.4.2
numexpr: 2.6.2
feather: None
matplotlib: 2.1.0
openpyxl: 2.4.8
xlrd: 1.1.0
xlwt: 1.3.0
xlsxwriter: 1.0.2
lxml: 4.1.0
bs4: 4.6.0
html5lib: 0.999999999
sqlalchemy: 1.1.13
pymysql: None
psycopg2: None
jinja2: 2.9.6
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

@jreback
Copy link
Contributor

jreback commented Jan 24, 2018

sparse is not fully test covered. a pull request to fix is welcome!

@jreback jreback added this to the Next Major Release milestone Jan 24, 2018
@hexgnu
Copy link
Contributor

hexgnu commented Jan 29, 2018

I hooked up gdb and tracked down the issue to inside of the lib.pyx which assumes that series's are of length > 0. I made a PR that should fix it, though we'll see how the tests chooch.

@jreback jreback modified the milestones: Next Major Release, 0.23.0 Feb 6, 2018
@jreback jreback closed this as completed in a01f74c Feb 6, 2018
harisbal pushed a commit to harisbal/pandas that referenced this issue Feb 28, 2018
closes pandas-dev#19368

Author: Matthew Kirk <[email protected]>

Closes pandas-dev#19438 from hexgnu/segfault_memory_usage and squashes the following commits:

f9433d8 [Matthew Kirk] Use shared docstring and get rid of if condition
4ead141 [Matthew Kirk] Move whatsnew doc to Sparse
ae9f74d [Matthew Kirk] Revert base.py
cdd4141 [Matthew Kirk] Fix linting error
93a0c3d [Matthew Kirk] Merge remote-tracking branch 'upstream/master' into segfault_memory_usage
207bc74 [Matthew Kirk] Define memory_usage on SparseArray
21ae147 [Matthew Kirk] FIX: revert change to lib.pyx
3f52a44 [Matthew Kirk] Ah ha I think I got it
5e59e9c [Matthew Kirk] Use range over 0 <= for loops
e251587 [Matthew Kirk] Fix failing test with indexing
27df317 [Matthew Kirk] Merge remote-tracking branch 'upstream/master' into segfault_memory_usage
7fdd03e [Matthew Kirk] Take out comment and use product
6bd6ddd [Matthew Kirk] BUG: don't assume series is length > 0
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Sparse Sparse Data Type
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants