Skip to content

BUG: read_html - file path cannot be pathlib.Path type #37736

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Nov 11, 2020

Conversation

inspurwusixuan
Copy link
Contributor

Fix read_html TypeError when parsing pathlib.Path object.

Copy link
Member

@arw2019 arw2019 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @inspurwusixuan for the PR!

Can you add tests? As reviewers that's the first thing we look for

You'll also need a whatsnew entry (probably 1.2)

@inspurwusixuan
Copy link
Contributor Author

Thanks for the reply! @arw2019

Can you add tests?

Do you mean adding test cases in the test_html.py?

You'll also need a whatsnew entry (probably 1.2)

Can you explain a little bit more what is the whatsnew entry and where should I start?

This is actually my first open-source contribution and I'm still trying to get familiar with everything. :)

Copy link
Member

@twoertwein twoertwein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, a test in test_html.py. You can for example try to read pandas/tests/io/data/html/spam.html one time with the path as a string and then with pathlib.Path and then compare that both create the same DataFrame.

The whatsnew is here. You probably want to add one line in the Bug fix I/O section.

@arw2019
Copy link
Member

arw2019 commented Nov 10, 2020

Thanks for the reply! @arw2019

Can you add tests?

Do you mean adding test cases in the test_html.py?

Yes that sounds like the place

You'll also need a whatsnew entry (probably 1.2)

Can you explain a little bit more what is the whatsnew entry and where should I start?

We document user-facing changes to the library on the whatsnew docs page. Here is the most recent one:
https://github.com/pandas-dev/pandas/blob/master/doc/source/whatsnew/v1.2.0.rst

The entry for this patch should probably go under Bug fixes, I/O. Take a look at other entries, have a go at writing yours and we'll give comments

This is actually my first open-source contribution and I'm still trying to get familiar with everything. :)

Welcome to pandas!

@arw2019 arw2019 added the IO HTML read_html, to_html, Styler.apply, Styler.applymap label Nov 10, 2020
@inspurwusixuan
Copy link
Contributor Author

Thanks for the comments! I will follow up with the test case and whatsnew entry. @arw2019

@pep8speaks
Copy link

pep8speaks commented Nov 10, 2020

Hello @inspurwusixuan! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2020-11-10 23:34:15 UTC

Copy link
Member

@WillAyd WillAyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good - mostly minor comments

@@ -507,6 +507,11 @@ I/O
- Bug in :class:`HDFStore` was dropping timezone information when exporting :class:`Series` with ``datetime64[ns, tz]`` dtypes with a fixed HDF5 store (:issue:`20594`)
- :func:`read_csv` was closing user-provided binary file handles when ``engine="c"`` and an ``encoding`` was requested (:issue:`36980`)
- Bug in :meth:`DataFrame.to_hdf` was not dropping missing rows with ``dropna=True`` (:issue:`35719`)
<<<<<<< HEAD
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

merge issue here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed merge issue

Copy link
Member

@WillAyd WillAyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm outside of merge issue @arw2019 commented on

@WillAyd WillAyd added this to the 1.2 milestone Nov 10, 2020
@jreback jreback merged commit ee1b75c into pandas-dev:master Nov 11, 2020
@jreback
Copy link
Contributor

jreback commented Nov 11, 2020

thanks @inspurwusixuan

@inspurwusixuan inspurwusixuan deleted the pandas-37705 branch November 11, 2020 19:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
IO HTML read_html, to_html, Styler.apply, Styler.applymap
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants