-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
Python Pandas read_html fails when reading tables from Wikipedia #21499
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Hmm interesting. Looks like this is still an issue on master even specifying the encoding to be used: >>> pd.read_html('https://en.wikipedia.org/wiki/2013–14_Premier_League', encoding='utf-8')
UnicodeEncodeError: 'ascii' codec can't encode character '\u2013' in position 14: ordinal not in range(128) Investigation and PRs are always welcome |
https://stackoverflow.com/questions/39229439/encoding-error-when-reading-url-with-urllib As seen in this similar issue, urllib only works with ASCII requests. To remedy, I used the Requests library (http://docs.python-requests.org/en/master/). |
FWIW the sample call works fine under Python 2.7.15 but not Python 3.6.5. Choice of engine doesn't matter, however. |
I used the following solution:
|
@StepanSushko 's solution works for me. |
I investigate this error, and I personally think it should be fixed. Let me take it and send PR. Reproduce
Solution
so, we can just use urllib instead of requests. |
I'm gonna try to implement the Unicode handling feature, but because of Python's RFC design, This is treated as the spec rather than the bug. details are written here. |
I am trying to read the tables from a Wikipedia page using the following code:
Doing that generates the following error:
I have tried
But still get the same error. The following works:
What I want to know is how to get pd.read_html() to work directly on the url without requests. What is it that I don't understand about encoding or is this a problem with Pandas?
I am running an Anaconda distribution of Pandas 0.21.1 and Python 3.5.4. Thanks for any help.
The text was updated successfully, but these errors were encountered: