-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Segfault when passing a mock.Mock instance to pandas.read_csv (it was an accident!) #15337
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Couldn't help exploring it a bit because the test-case is so easy. I installed the develop branch of pandas from source just now and traced the bug back to where it enters the parser defined in cython at this point in the Seems like a sound strategy would be to maybe rule out that whatever is getting passed to that parser inside |
sure it must be a string, bytes, or a file-like (which mainly involves having |
Yeah, but I am suspicious that pandas is testing for For instance, I isolated the error to this chunk of code: If you comment this
However, I think the test inside Anyway, I believe that the test for whether it really is a Edit: I may have misread your comment, a bit. Apologies for any confusion. |
@pellagic-puffbomb sure, you can test earlier. We have a fairly complicated opening process (because of many many possibilities of what it could be, encoding, from url etc). So it may not be possible to do it earlier (though it certainly might be). I wouldn't worry about malicious things, that is the resonsibility of the user. We necessarily have a string to a file (which may or may not exist), or a file-like object which we read bytes. If someone wants to give a duck-like that is just fine. If you can check earlier and have everything passing that would be great. We DO have a fairly extensive test suite for things like this. |
Alright, well, I have an idea that may help a bit. |
Hmm...seems like this thread has gone a bit stale. Reading through this, I do agree that we can do a better job if we encounter invalid objects for parsing. For example, specifying the Python engine leads to this: >>> read_csv(mock.Mock(), engine="python")
...
TypeError: argument 1 must be an iterator That comes from Python's native The problem with attribute checking is that >>> mock.Mock().__iter__
...
AttributeError: __iter__ If you take a look at the C-code for Python's |
it might be enough to check if a file-like (IOW not a string)
from above, we might need a more sophisticated test (which could use |
How about adding that as a check in |
yes Then calling in parser (and other methods would be great). |
Code Sample
Problem description
I wrote a test where I accidentally ended up passing a
mock.Mock
object intopandas.read_csv
and it segfaults.Expected Output
Well, it would nice if it threw an error telling me that I screwed up, instead of segfaulting.
Output of
pd.show_versions()
INSTALLED VERSIONS
commit: None
python: 3.5.1.final.0
python-bits: 64
OS: Darwin
OS-release: 16.3.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
pandas: 0.18.0
nose: None
pip: 9.0.1
setuptools: 24.0.2
Cython: 0.24.1
numpy: 1.10.4
scipy: None
statsmodels: None
xarray: None
IPython: 5.1.0
sphinx: None
patsy: None
dateutil: 2.5.3
pytz: 2016.4
blosc: None
bottleneck: None
tables: None
numexpr: None
matplotlib: None
openpyxl: None
xlrd: 1.0.0
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: 1.0.9
pymysql: None
psycopg2: 2.6.1 (dt dec pq3 ext)
jinja2: 2.8
boto: None
The text was updated successfully, but these errors were encountered: