Skip to content

io.html.read_html returning table "twice" #5384

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
phaebz opened this issue Oct 30, 2013 · 2 comments
Closed

io.html.read_html returning table "twice" #5384

phaebz opened this issue Oct 30, 2013 · 2 comments

Comments

@phaebz
Copy link
Contributor

phaebz commented Oct 30, 2013

Forgive the markup indentation.

from pandas.io.html import read_html
read_html('http://www.epexspot.com/en/market-data/auction/auction-table/2005-10-25/DE',
attrs={'class': 'list hours responsive'},
skiprows=1,
parse_dates=False)

returns a list of two data frames where the first one is equivalent to read_html() without the parse_dates arg and the second one is the one expected to be returned by calling read_html() with parse_dates arg.

Am I missing something?

@cpcloud
Copy link
Member

cpcloud commented Oct 30, 2013

If you're using Google Chrome then you can right click anywhere on the page and click "View page source". This will show you that the page contains two tables with the class attribute value that you've shown, so two tables will be returned. They are different tables, which you can see by looking at their respective cells.

@phaebz
Copy link
Contributor Author

phaebz commented Oct 30, 2013

Thanks for the clout behind the ears. I had wrong assumptions that did hold for other dates / other HTML markup on the site. For now using manual xpath via lxml.

Opened #5389 for FR.

@phaebz phaebz closed this as completed Oct 30, 2013
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants