Skip to content

UNI/HTML/WIP: add encoding argument to read_html #7323

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jun 4, 2014
Merged

UNI/HTML/WIP: add encoding argument to read_html #7323

merged 1 commit into from
Jun 4, 2014

Conversation

cpcloud
Copy link
Member

@cpcloud cpcloud commented Jun 3, 2014

closes #7220

@@ -165,10 +168,11 @@ class _HtmlFrameParser(object):
See each method's respective documentation for details on their
functionality.
"""
def __init__(self, io, match, attrs):
def __init__(self, io, match, attrs, encoding):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this default to None, then set to utf-8? (or just not set and leave as None)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it defaults to None (from the read_html entry point) because I didn't want to enforce an encoding if bs4 or lxml can parse it from HTML meta information.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

k...sounds good

@cpcloud
Copy link
Member Author

cpcloud commented Jun 3, 2014

@klonuo would you like to try out this branch on your data and see if it's to your liking?

@jreback
Copy link
Contributor

jreback commented Jun 3, 2014

don't you have to: cc @klonuo ?

@cpcloud
Copy link
Member Author

cpcloud commented Jun 3, 2014

i did

@@ -1,5 +1,8 @@
# encoding: utf8
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note to self: remove this either after passes on travis or before merge

@klonuo
Copy link
Contributor

klonuo commented Jun 3, 2014

@cpcloud I just tried it and it works fine

Thanks for your patience

@cpcloud
Copy link
Member Author

cpcloud commented Jun 3, 2014

great! thanks for the report. keep the issues coming . promise i won't bite anymore :)

@jreback jreback added this to the 0.14.1 milestone Jun 3, 2014
@cpcloud cpcloud self-assigned this Jun 3, 2014
@cpcloud
Copy link
Member Author

cpcloud commented Jun 4, 2014

@jreback going to merge this after whatsnew

@jreback
Copy link
Contributor

jreback commented Jun 4, 2014

yep

cpcloud added a commit that referenced this pull request Jun 4, 2014
UNI/HTML/WIP: add encoding argument to read_html
@cpcloud cpcloud merged commit 89983c3 into pandas-dev:master Jun 4, 2014
@cpcloud cpcloud deleted the html-encoding branch June 4, 2014 14:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement IO HTML read_html, to_html, Styler.apply, Styler.applymap Unicode Unicode strings
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Suggestions for html table parsing
3 participants