Skip to content

Allow flavor argument to read_html to be list/instance of _HtmlFrameParser #4594

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
cancan101 opened this issue Aug 17, 2013 · 12 comments
Closed
Labels
API Design IO HTML read_html, to_html, Styler.apply, Styler.applymap
Milestone

Comments

@cancan101
Copy link
Contributor

Currently the flavor argument to read_html must either be None or a string or a container of strings. This make is difficult for the user to write his own instance of _HtmlFrameParser and to use that when parsing. I suggest also allowing the user to pass in a subclass of _HtmlFrameParser in place of a string.

Implementation wise, this should be easy. _parser_dispatch would be modified to for this case. Likewise _validate_parser_flavor would need to allow this.

@cpcloud
Copy link
Member

cpcloud commented Aug 17, 2013

@cancan101 Would you like to submit a PR?

@cancan101
Copy link
Contributor Author

@cpcloud Sure.

@jreback
Copy link
Contributor

jreback commented Sep 27, 2013

pushing to 0.14

@cpcloud
Copy link
Member

cpcloud commented Sep 27, 2013

@cancan101 I think it's confusing to allow an instance of some internal class in a public API function. However, I am definitely open to discussing some kind of extension API ... since, well, there really isn't one.

@jtratner
Copy link
Contributor

agree with @cpcloud. Internal class shouldn't be an option for public function.

@jreback
Copy link
Contributor

jreback commented Sep 27, 2013

Why don't you have a publicly extendable 'Style' class or something that can be passed in (or inherited) and passed in? (I think that we are tlaking about this for the output formatters as well)

@cpcloud
Copy link
Member

cpcloud commented Sep 27, 2013

bikeshedding alert: i think a Flavor class might be easier for a drive by contributor to grok ... Style sounds too much like it means output formatting

in any case .. good idea

@cancan101
Copy link
Contributor Author

I agree with the statement that internal classes should not be used in a public API. That being said, what is the difference between the proposed Flavor cass and the existing _HtmlFrameParser? In other words, is it just a matter of moving _HtmlFrameParser into the public API (and renaming it accordingly)?

@cpcloud
Copy link
Member

cpcloud commented Sep 27, 2013

It could potentially be that simple, but there would need to be a way register flavors ... that's not hard, it's just not there yet

@cancan101
Copy link
Contributor Author

@cpcloud I am looking to tackle this. My initial thought it to allow passing in "user defined parsers". The simplest solution would be to allow the flavor argument to be a "_HtmlFrameParser".

A couple of questions/ thoughts:

  1. Should it be a subclass of _HtmlFrameParser (literally a type) or an instance of a subclass? Or allow either?
  2. Should _HtmlFrameParser be renamed to Flavor? Or maybe I should create an empty class called Flavor or HTMLParserFlavor that user implemented subclasses are expected to subclass.

In addition, I COULD add the ability to register the flavor so that the user can than use a string rather than passing in the class/ object itself. I am indifferent to this and am okay just passing in the class/object and no bothering about the string indirection.

@ghost
Copy link

ghost commented Jan 24, 2014

I read this as the original topic being voted down and any work on a "flavor" (?) needs it's own
issues, there may be one already. There have been signs of a more conservative approach
to growing read_html.

You are of course free to open a new issue and discuss any problems that need addressing,
we've elsewhere encouraged @cancan101 to start a new project focused on giving the user
more control over HTML-related data munging based on pandas.

@ghost ghost closed this as completed Jan 24, 2014
@cpcloud
Copy link
Member

cpcloud commented Jan 24, 2014

I think it should be called pandatree

This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Design IO HTML read_html, to_html, Styler.apply, Styler.applymap
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants