-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: read_msgpack raise an error when passed an non existent path in Python 2 #16523
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
06a70b1
02b041c
6ea733c
9d0f3b6
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -192,7 +192,6 @@ def read(fh): | |
|
||
# see if we have an actual file | ||
if isinstance(path_or_buf, compat.string_types): | ||
|
||
try: | ||
exists = os.path.exists(path_or_buf) | ||
except (TypeError, ValueError): | ||
|
@@ -202,18 +201,21 @@ def read(fh): | |
with open(path_or_buf, 'rb') as fh: | ||
return read(fh) | ||
|
||
# treat as a binary-like | ||
if isinstance(path_or_buf, compat.binary_type): | ||
# treat as a binary-like | ||
fh = None | ||
try: | ||
fh = compat.BytesIO(path_or_buf) | ||
return read(fh) | ||
# We can't distinguish between a path and a buffer of bytes in | ||
# Python 2 so instead assume the first byte of a valid path is | ||
# less than 0x80. | ||
if compat.PY3 or ord(path_or_buf[0]) >= 0x80: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this would fail for a 0-len buffer. what does the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. These seems fragile. IIUC, it's possible to have filenames that, according to Python, start with characters above 0x80, even if the filesystem does some encoding on the filename before reading on writing. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Unfortunately I don't see a way to make this avoid the edge cases like this for Python 2 with the current API. I think this way minimised the number of affected users and if it does affect someone the check can be bypassed using There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
According to the msgpack spec "Applications can assign 0 to 127 to store application-specific type information.". I believe pandas doesn't currently use this so this assumes if the first byte is below I'm not sure what the correct behaviour should be for passing |
||
fh = compat.BytesIO(path_or_buf) | ||
return read(fh) | ||
finally: | ||
if fh is not None: | ||
fh.close() | ||
|
||
# a buffer like | ||
if hasattr(path_or_buf, 'read') and compat.callable(path_or_buf.read): | ||
elif hasattr(path_or_buf, 'read') and compat.callable(path_or_buf.read): | ||
# treat as a buffer like | ||
return read(path_or_buf) | ||
|
||
raise ValueError('path_or_buf needs to be a string file path or file-like') | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can simply check
os.path.exists(path_or_buf)
(see how this is done inpandas.io.json.json.read_json
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I saw your expl. But if
os.path.exists(path_or_buf)
fails then you know for sure its NOT a path. Then you can try to read. I agree you can't then distinguish between an invalid path and an invalid byte stream, but so what.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The benefit is when an incorrect path is accidentally used. As it stands the code can continue running with invalid data until it crashes mysteriously elsewhere or silently produces an incorrect result (the latter motivated me to make this PR).