-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
io.html.read_html support XPath table identification #5389
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I thought about this when I first wrote it and decided against it, mostly for time reasons and because exposing a user to XPath should be avoided, but I think this is probably a good idea for "power" users. @phaebz Want to take a crack at it? Happy to help you navigate the code, if you want. |
You could add this as a separate kwarg (xpath=None), that if not None would |
@phaebz BTW you can call import pandas as pd
dfs = pd.read_html(match='Van Halen/Panama', attrs={'class': 'whatever'}) No need to type the fully qualified name since we import all IO functions into the top-level |
@cpcloud Good to know plus I liked the Eddie just there :) I had a look into the code and I will begin with adapting the |
First attempt at implementation. I added the functionality to |
Fine to just add to lxml. Maybe the selenium / phantomjs parser would work |
pushing to 0.14 |
I agree with @cpcloud's original reasoning. I think xpath support is going too far. @cpcloud @jtratner, I vote to close. Do you feel strongly about this? |
No strong feelings here. |
Currently there is no context given with the list of tables, so having xpath would be helpful. |
I'm not interested in working on it and am favor of simplicity, so I vote |
@cancan101That's true in that making any feature more powerful is potentially helpful. I think the current API that @cpcloud put together is just expressive enough @phaebz, thanks for putting in the time doing this. I'm sorry we're going to |
@cancan101 or @phaebz - you could also create a separate package that |
Absolutely. and no one can get it in your way. :) |
Feature request as in subject that came up on #5384. This would be for cases where table attrs alone don't cut it and manual lxml / bs4 could be avoided. What do you think?
The text was updated successfully, but these errors were encountered: