Skip to content

BUG: read_html - file path cannot be pathlib.Path type #37736

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Nov 11, 2020
5 changes: 5 additions & 0 deletions doc/source/whatsnew/v1.2.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -507,6 +507,11 @@ I/O
- Bug in :class:`HDFStore` was dropping timezone information when exporting :class:`Series` with ``datetime64[ns, tz]`` dtypes with a fixed HDF5 store (:issue:`20594`)
- :func:`read_csv` was closing user-provided binary file handles when ``engine="c"`` and an ``encoding`` was requested (:issue:`36980`)
- Bug in :meth:`DataFrame.to_hdf` was not dropping missing rows with ``dropna=True`` (:issue:`35719`)
<<<<<<< HEAD
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

merge issue here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed merge issue

- Bug in :func:`read_html` was raising a ``TypeError`` when supplying a ``pathlib.Path`` argument to the ``io`` parameter (:issue:`37705`)
=======
- Bug in :func:`read_html` was raising a ``TypeError`` when supplying a ``pathlib.Path`` argument to the ``io`` parameter (:issue:`37705`)
>>>>>>> 88bc2ee492633ab6a94be78384c5ac7524043322

Plotting
^^^^^^^^
Expand Down
5 changes: 4 additions & 1 deletion pandas/io/html.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@
from pandas.core.construction import create_series_with_explicit_dtype
from pandas.core.frame import DataFrame

from pandas.io.common import is_url, urlopen, validate_header_arg
from pandas.io.common import is_url, stringify_path, urlopen, validate_header_arg
from pandas.io.formats.printing import pprint_thing
from pandas.io.parsers import TextParser

Expand Down Expand Up @@ -1080,6 +1080,9 @@ def read_html(
"data (you passed a negative value)"
)
validate_header_arg(header)

io = stringify_path(io)

return _parse(
flavor=flavor,
io=io,
Expand Down
9 changes: 9 additions & 0 deletions pandas/tests/io/test_html.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@
from importlib import reload
from io import BytesIO, StringIO
import os
from pathlib import Path
import re
import threading
from urllib.error import URLError
Expand Down Expand Up @@ -1233,3 +1234,11 @@ def run(self):
while helper_thread1.is_alive() or helper_thread2.is_alive():
pass
assert None is helper_thread1.err is helper_thread2.err

def test_parse_path_object(self, datapath):
# GH 37705
file_path_string = datapath("io", "data", "html", "spam.html")
file_path = Path(file_path_string)
df1 = self.read_html(file_path_string)[0]
df2 = self.read_html(file_path)[0]
tm.assert_frame_equal(df1, df2)