-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
UNI/HTML/WIP: add encoding argument to read_html #7323
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@@ -165,10 +168,11 @@ class _HtmlFrameParser(object): | |||
See each method's respective documentation for details on their | |||
functionality. | |||
""" | |||
def __init__(self, io, match, attrs): | |||
def __init__(self, io, match, attrs, encoding): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this default to None
, then set to utf-8
? (or just not set and leave as None
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it defaults to None
(from the read_html
entry point) because I didn't want to enforce an encoding if bs4
or lxml
can parse it from HTML meta information.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
k...sounds good
@klonuo would you like to try out this branch on your data and see if it's to your liking? |
don't you have to: cc @klonuo ? |
i did |
@@ -1,5 +1,8 @@ | |||
# encoding: utf8 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note to self: remove this either after passes on travis or before merge
@cpcloud I just tried it and it works fine Thanks for your patience |
great! thanks for the report. keep the issues coming . promise i won't bite anymore :) |
@jreback going to merge this after whatsnew |
yep |
UNI/HTML/WIP: add encoding argument to read_html
closes #7220