Skip to content

BUG: read_html - file path cannot be pathlib.Path type #37736

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Nov 11, 2020
1 change: 1 addition & 0 deletions doc/source/whatsnew/v1.2.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -507,6 +507,7 @@ I/O
- Bug in :class:`HDFStore` was dropping timezone information when exporting :class:`Series` with ``datetime64[ns, tz]`` dtypes with a fixed HDF5 store (:issue:`20594`)
- :func:`read_csv` was closing user-provided binary file handles when ``engine="c"`` and an ``encoding`` was requested (:issue:`36980`)
- Bug in :meth:`DataFrame.to_hdf` was not dropping missing rows with ``dropna=True`` (:issue:`35719`)
- Bug in :func:`read_html` was raising a ``TypeError`` when supplying a ``pathlib.Path`` argument to the ``io`` parameter (:issue:`37705`)

Plotting
^^^^^^^^
Expand Down
5 changes: 4 additions & 1 deletion pandas/io/html.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@
from pandas.core.construction import create_series_with_explicit_dtype
from pandas.core.frame import DataFrame

from pandas.io.common import is_url, urlopen, validate_header_arg
from pandas.io.common import is_url, stringify_path, urlopen, validate_header_arg
from pandas.io.formats.printing import pprint_thing
from pandas.io.parsers import TextParser

Expand Down Expand Up @@ -1080,6 +1080,9 @@ def read_html(
"data (you passed a negative value)"
)
validate_header_arg(header)

io = stringify_path(io)

return _parse(
flavor=flavor,
io=io,
Expand Down
9 changes: 9 additions & 0 deletions pandas/tests/io/test_html.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@
from importlib import reload
from io import BytesIO, StringIO
import os
from pathlib import Path
import re
import threading
from urllib.error import URLError
Expand Down Expand Up @@ -1233,3 +1234,11 @@ def run(self):
while helper_thread1.is_alive() or helper_thread2.is_alive():
pass
assert None is helper_thread1.err is helper_thread2.err

def test_parse_path_object(self, datapath):
# GH 37705
file_path_string = datapath("io", "data", "html", "spam.html")
file_path = Path(file_path_string)
df1 = self.read_html(file_path_string)[0]
df2 = self.read_html(file_path)[0]
tm.assert_frame_equal(df1, df2)