-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
read_html doesn't work for wikipedia tables #7762
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I see what you mean. Specifically, I see three different problems here:
So, to get your work done, add the kwargs Can any other pandas folks comment on the plans for |
Thanks for looking into this. After reporting this I dug a bit into this and also noticed those odd sort columns and so on. So, well, it's not really up to pandas, but indeed part of what there is. On the other hand, if this "table pattern" is indeed very common to wikipedia, it could be worthwhile to implement a "wikipedia" mode to the parser? (Which is activated automatically for |
Hm |
@danielballan thanks for the explanation! i put up a pr to fix this ... #7851, check it out at your leisure |
I assumed reading wikipedia html tables should work for
read_html
, but it returned a lot of garbage 😦Example:
I've seen several related issues, maybe this is an useful test case.
Versions:
The text was updated successfully, but these errors were encountered: