-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
read_html(): rewinding [wip] #18017
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
read_html(): rewinding [wip] #18017
Conversation
tests! |
Thankfully, Python has a built-in solution for this, |
83c8dfc
to
2d03b15
Compare
pandas/io/html.py
Outdated
# and try to rewind it before trying the next parser | ||
if hasattr(io, 'seekable') and io.seekable(): | ||
io.seek(0) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make an if else
Hello @LiamIm! Thanks for updating the PR. Cheers ! There are no PEP8 issues in this Pull Request. 🍻 Comment last updated on October 29, 2017 at 01:08 Hours UTC |
How should I test rewinding on file objects? Just add a malformed HTML file to |
there are several examples in the issue |
pandas/io/html.py
Outdated
io.seek(0) | ||
elif hasattr(io, 'seekable') and not io.seekable(): | ||
# if we couldn't rewind it, let the user know | ||
raise ValueError('The favor {} failed to parse your input. ' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
favor -> flavor
9e3b9cb
to
eb9d525
Compare
If lxml has read to the end of a file and then errored, bs4/html5lib won't rewind it before trying to parse again, and will throw a `ValueError: No text parsed from document`. This patch fixes this issue, by rewinding the file object when a parser fails. If the object was IO-ish but not seekable, we throw an error notifying the user and asking them to try a different flavor.
eb9d525
to
a85cd42
Compare
Codecov Report
@@ Coverage Diff @@
## master #18017 +/- ##
==========================================
+ Coverage 91.23% 91.25% +0.01%
==========================================
Files 163 163
Lines 50091 50095 +4
==========================================
+ Hits 45703 45712 +9
+ Misses 4388 4383 -5
Continue to review full report at Codecov.
|
Codecov Report
@@ Coverage Diff @@
## master #18017 +/- ##
==========================================
+ Coverage 91.24% 91.25% +0.01%
==========================================
Files 163 163
Lines 50091 50095 +4
==========================================
+ Hits 45704 45713 +9
+ Misses 4387 4382 -5
Continue to review full report at Codecov.
|
thanks @LiamIm nice PR! |
git diff upstream/master -u -- "*.py" | flake8 --diff