Skip to content

read_csv in multiple theads causes segmentation fault #11786

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
mrocklin opened this issue Dec 7, 2015 · 4 comments
Closed

read_csv in multiple theads causes segmentation fault #11786

mrocklin opened this issue Dec 7, 2015 · 4 comments
Labels
Bug IO CSV read_csv, to_csv
Milestone

Comments

@mrocklin
Copy link
Contributor

mrocklin commented Dec 7, 2015

The following script causes a segfault on my machine

from io import BytesIO
from multiprocessing.pool import ThreadPool
import pandas as pd

# Make many fake CSV files in memory
bytes = ['\n'.join(['%d,%d,%d' % (i,i,i) for i in range(10000)]).encode()
         for j in range(100)]
files = [BytesIO(b) for b in bytes]

# Read all files in many threads
pool = ThreadPool(8)
pool.map(pd.read_csv, files)
$ python script.py 
Segmentation fault (core dumped)

Python 3.4, Pandas 0.17.1, Ubuntu 14.04

@mrocklin mrocklin changed the title Read_csv is no longer thread safe read_csv in multiple theads causes segmentation fault Dec 7, 2015
@TomAugspurger
Copy link
Contributor

FWIW on OSX that script just hangs at 99% CPU use. pandas 0.17.1, python 3.5.

@jreback
Copy link
Contributor

jreback commented Dec 7, 2015

cc @jdeschenes can you have a look

@jreback
Copy link
Contributor

jreback commented Dec 7, 2015

@mrocklin thanks for the repro!

@jdeschenes
Copy link
Contributor

I think I found the issue, see my pull request. The issue was caused by a misplaced PyGilState_ensure(It was called after a Py_XDECREF being called.

Using read_csv with threads on such an object might have a big impact on performance.

jdeschenes pushed a commit to jdeschenes/pandas that referenced this issue Jan 19, 2016
…tringIO object., pandas-dev#11786

The issue was caused by a misplaced PyGilSate_Ensure()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug IO CSV read_csv, to_csv
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants