Skip to content

read_msgpack returns garbage for non-existing files in python2 #15296

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
languitar opened this issue Feb 3, 2017 · 4 comments · Fixed by #16523
Closed

read_msgpack returns garbage for non-existing files in python2 #15296

languitar opened this issue Feb 3, 2017 · 4 comments · Fixed by #16523

Comments

@languitar
Copy link

languitar commented Feb 3, 2017

Code Sample, a copy-pastable example if possible

In [1]: import pandas as pd

In [2]: pd.__version__
Out[2]: u'0.19.2'

In [3]: pd.read_msgpack('/tmp/bla.txt')
Out[3]: [47, 116, 109, 112, 47, 98, 108, 97, 46, 116, 120, 116]

That file does not exist.

The same code with the same version on python 3 correctly raises an exception.

Problem description

Reading a non existing file returns a list of integers instead of failing if pandas is used in python 2. This makes detecting error quite hard.

Expected Output

An exception.

Output of pd.show_versions()

INSTALLED VERSIONS
------------------
commit: None
python: 2.7.13.final.0
python-bits: 64
OS: Linux
OS-release: 4.9.6-1-ARCH
machine: x86_64
processor: 
byteorder: little
LC_ALL: None
LANG: en_US.utf8
LOCALE: None.None

pandas: 0.19.2
nose: 1.3.7
pip: 9.0.1
setuptools: 34.1.0
Cython: None
numpy: 1.11.3
scipy: 0.18.1
statsmodels: None
xarray: None
IPython: 5.1.0
sphinx: None
patsy: None
dateutil: 2.6.0
pytz: 2016.10
blosc: None
bottleneck: 1.2.0
tables: 3.3.0
numexpr: 2.6.1
matplotlib: 1.5.3
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.9.5
boto: None
pandas_datareader: None
@jreback
Copy link
Contributor

jreback commented Feb 3, 2017

I recall this being fixed a little while back. Are you sure you are running with 0.19.2?

In [1]: pd.read_msgpack('/tmp/bla.txt')
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-1-7a3c5705a157> in <module>()
----> 1 pd.read_msgpack('/tmp/bla.txt')

/Users/jreback/miniconda3/envs/pandas/lib/python3.5/site-packages/pandas/io/packers.py in read_msgpack(path_or_buf, encoding, iterator, **kwargs)
    216         return read(path_or_buf)
    217 
--> 218     raise ValueError('path_or_buf needs to be a string file path or file-like')
    219 
    220 dtype_dict = {21: np.dtype('M8[ns]'),

ValueError: path_or_buf needs to be a string file path or file-like

In [2]: pd.__version__
Out[2]: '0.19.2'

@languitar
Copy link
Author

Yes, definitely 0.19.2. It seems to be fixed only in python 3.

@jreback
Copy link
Contributor

jreback commented Feb 3, 2017

ahh, that is possible (must not be well tested)

want to add a test and fix?

I think we are doing a better job of this now-a-days in things like read_json IIRC. IOW if its an actual string, you ask the file system if its valid or not, then raise a better error message (e.g. IOError: file not found)

yeah I see this doesn't use pandas.io.common. to actually open the file, instead it passes it to the msgpack routines. This is where checking for things e.g. in read_csv happen.

@jreback jreback added this to the 0.20.0 milestone Feb 3, 2017
@chris-b1
Copy link
Contributor

chris-b1 commented Feb 3, 2017

This is a dupe of #12225 (closed that one) - issue is that in python 2 the string is interpreted as bytes.

@jreback jreback modified the milestones: 0.20.0, Next Major Release Mar 23, 2017
chrisburr added a commit to chrisburr/pandas that referenced this issue May 27, 2017
chrisburr added a commit to chrisburr/pandas that referenced this issue May 27, 2017
chrisburr added a commit to chrisburr/pandas that referenced this issue May 27, 2017
chrisburr added a commit to chrisburr/pandas that referenced this issue May 27, 2017
chrisburr added a commit to chrisburr/pandas that referenced this issue May 28, 2017
chrisburr added a commit to chrisburr/pandas that referenced this issue May 28, 2017
chrisburr added a commit to chrisburr/pandas that referenced this issue Aug 20, 2017
chrisburr added a commit to chrisburr/pandas that referenced this issue Oct 28, 2017
@jreback jreback modified the milestones: Next Major Release, 0.22.0 Oct 28, 2017
TomAugspurger pushed a commit that referenced this issue Oct 30, 2017
…Python 2 (#16523)

* TST: Add tests for trying to read non-existent files #15296

* BUG: Fix passing non-existant file to read_msgpack #15296

* TST: Fix io.test_common.test_read_non_existant for external modules

* CLN: Import FileNotFoundError in tests/io/test_common.py
peterpanmj pushed a commit to peterpanmj/pandas that referenced this issue Oct 31, 2017
…Python 2 (pandas-dev#16523)

* TST: Add tests for trying to read non-existent files pandas-dev#15296

* BUG: Fix passing non-existant file to read_msgpack pandas-dev#15296

* TST: Fix io.test_common.test_read_non_existant for external modules

* CLN: Import FileNotFoundError in tests/io/test_common.py
No-Stream pushed a commit to No-Stream/pandas that referenced this issue Nov 28, 2017
…Python 2 (pandas-dev#16523)

* TST: Add tests for trying to read non-existent files pandas-dev#15296

* BUG: Fix passing non-existant file to read_msgpack pandas-dev#15296

* TST: Fix io.test_common.test_read_non_existant for external modules

* CLN: Import FileNotFoundError in tests/io/test_common.py
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants