Skip to content

TYP/DOC: flavor parameter with incorrect type hint in read_html #55059

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
matheusfelipeog opened this issue Sep 8, 2023 · 1 comment · Fixed by #55076
Closed

TYP/DOC: flavor parameter with incorrect type hint in read_html #55059

matheusfelipeog opened this issue Sep 8, 2023 · 1 comment · Fixed by #55076
Labels
Docs Typing type annotations, mypy/pyright type checking

Comments

@matheusfelipeog
Copy link
Contributor

Problem

The flavor parameter has the incorret type hint in the read_html(...) function.

Currently, the type hint is an optional str. However, according to the documentation in io.html it is possible to pass a list-like:

The lxml backend will raise an error on a failed parse if that is the only parser you provide. 
If you only have a single parser you can provide just a string, but it is considered good practice
to pass a list with one string if, for example, the function expects a sequence of strings. You may use:

>>> dfs = pd.read_html(url, "Metcalf Bank", index_col=0, flavor=["lxml"])

Or you could pass flavor='lxml' without a list:

>>> dfs = pd.read_html(url, "Metcalf Bank", index_col=0, flavor="lxml")

However, if you have bs4 and html5lib installed and pass None or ['lxml', 'bs4'] then the parse will
most likely succeed. Note that as soon as a parse succeeds, the function will return.

>>> dfs = pd.read_html(url, "Metcalf Bank", index_col=0, flavor=["lxml", "bs4"])

Internally read_html(...) converts the passed value to a tuple in both cases.

With this incorrect type hint, type checkers throw an error when passed a list-like:

image

Solution

We can set the type hint to a str or a Sequence[str], both of which are optional:

flavor: str | Sequence[str] | None = None

I can resolve this and create a PR by updating the type hints, docstrings and documentation. Just let me know if you agree with my solution.

@twoertwein
Copy link
Member

I can resolve this and create a PR by updating the type hints, docstrings and documentation. Just let me know if you agree with my solution.

Sounds good to me! As you mentioned, the doc-string of read_html needs to also be updated as it only mentions str | None.

@twoertwein twoertwein added Docs Typing type annotations, mypy/pyright type checking labels Sep 8, 2023
Dr-Irv pushed a commit to pandas-dev/pandas-stubs that referenced this issue Nov 2, 2023
* Fix flavor param with incorrect type hint in read_html

refs:
- pandas-dev/pandas#55059
- pandas-dev/pandas#55076

* Add HTMLFlavors type to read_html

ref: pandas-dev/pandas#55529

* Add tests and new dev dependencies

Added:
- tests to check HTMLFlavors type in read_html flavor arg;
- set beautifulsoup4 and html5lib as dev dependencies. They are used
by the respective flavors in read_html.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Docs Typing type annotations, mypy/pyright type checking
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants