-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
ENH: Add file buffer validation to I/O ops #15894
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
pandas/types/inference.py
Outdated
|
||
|
||
def is_file_like(obj): | ||
file_attrs = ('read', 'write', 'seek', 'tell') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add a doc-string describing this. (and of course any others you want to document; prob should open an issue for that).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't too hard to fix here in this PR. Will do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also add a versionadded (for this one).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. Will make sure.
if not hasattr(obj, attr): | ||
return False | ||
|
||
if not is_iterator(obj): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so a file_like must be an iterator, but an iterator is not necessarily file_like right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct. The reason is the following:
>>> from unittest import mock
>>> m = mock.Mock()
>>> hasattr(m, "read")
True
>>> hasattr(m, "__iter__")
False
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
k, maybe add that statement (or describe). Remember these are actually user-level accessible :>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is true. In the doc-string, I presume?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep exactly. after this is merged I am going to create an issue to improve these doc-strings (novice level); the ones you don't do :>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That won't be necessary. A commit is being added here for that.
EDIT: Ah, I see you expanded to include more than just this file. Fair enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah should document them all ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
examples too!
lgtm. |
pandas/io/excel.py
Outdated
if _is_url(io): | ||
url_passed = _is_url(io) | ||
|
||
if url_passed: | ||
io = _urlopen(io) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
might be simpler to have this in an if/elif
if _is_url(io):
io = _urlopen(io)
elif not isinstance(io, .....)
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. Fixed.
Codecov Report
@@ Coverage Diff @@
## master #15894 +/- ##
==========================================
- Coverage 90.97% 90.97% -0.01%
==========================================
Files 145 145
Lines 49496 49518 +22
==========================================
+ Hits 45027 45047 +20
- Misses 4469 4471 +2
Continue to review full report at Codecov.
|
cc3a9b6
to
3c477be
Compare
@jreback : Everything is green and ready to go. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice docstrings!
pandas/types/inference.py
Outdated
# python 3 generators have __next__ instead of next | ||
return hasattr(obj, 'next') or hasattr(obj, '__next__') | ||
""" | ||
Check if the object is an iterable. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe good to notice here that a string does not give True (although it is iterable, but not an iterator)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nor is a datetime, yes would be good to mention these counter-examples
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
not isinstance(arg, string_and_binary_types)) | ||
def is_list_like(obj): | ||
""" | ||
Check if the object is list-like. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe give an inexhaustive list of objects that are regarded as list-like? (lists, sets, arrays, Series, ...)
I would also add example with np.array(..)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mention that strings & datetimes are NOT is_list_like .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
3c477be
to
ba08e6d
Compare
pandas/types/inference.py
Outdated
3) seek | ||
4) tell | ||
|
||
In addition, the object must be iterable, as mock |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is assuming a little bit of knowledge here. I would just eliminate this paragraph.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense. Done.
except TypeError: | ||
return False | ||
else: | ||
return True | ||
|
||
|
||
def is_sequence(x): | ||
def is_sequence(obj): | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would note that this is less strict than is_list_like
. Further impl wise we almost never use this one. I don't really recall what the actual diffferences are though. you can have a look thru the code base and see if you can come up with a good reason for this (IIRC its in the printing routines somewhere).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I think it's the other way around:
>>> i = iter([1, 2, 3])
>>> is_list_like(i)
True
>>> is_sequence(i)
False
is_sequence
checks for len
and iter
properties, whereas is_list_like
just checks for the __iter__
property.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think leaving both functions are fine for now, especially in the context of this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, just ideally wanted to document this.
Partially addresses pandas-devgh-15895.
ba08e6d
to
5a8f8da
Compare
@jreback , @jorisvandenbossche : Everything is green and ready to go. |
thanks! |
read_*
functions.inference.py
Closes #15337.
Partially addresses #15895.