Skip to content

Commit ee1b75c

Browse files
BUG: read_html - file path cannot be pathlib.Path type (#37736)
* BUG: read_html - file path cannot be pathlib.Path type * BUG: read_html - file path cannot be pathlib.Path type * BUG: read_html - file path cannot be pathlib.Path type * closes #37705 * Add comments closes #37705 * Update doc/source/whatsnew/v1.2.0.rst Co-authored-by: William Ayd <[email protected]> * Update pandas/tests/io/test_html.py Co-authored-by: William Ayd <[email protected]> * Fix comments closes #37705 * Fix merge issue closes #37705 Co-authored-by: William Ayd <[email protected]>
1 parent 90dc9ae commit ee1b75c

File tree

3 files changed

+14
-1
lines changed

3 files changed

+14
-1
lines changed

doc/source/whatsnew/v1.2.0.rst

+1
Original file line numberDiff line numberDiff line change
@@ -507,6 +507,7 @@ I/O
507507
- Bug in :class:`HDFStore` was dropping timezone information when exporting :class:`Series` with ``datetime64[ns, tz]`` dtypes with a fixed HDF5 store (:issue:`20594`)
508508
- :func:`read_csv` was closing user-provided binary file handles when ``engine="c"`` and an ``encoding`` was requested (:issue:`36980`)
509509
- Bug in :meth:`DataFrame.to_hdf` was not dropping missing rows with ``dropna=True`` (:issue:`35719`)
510+
- Bug in :func:`read_html` was raising a ``TypeError`` when supplying a ``pathlib.Path`` argument to the ``io`` parameter (:issue:`37705`)
510511

511512
Plotting
512513
^^^^^^^^

pandas/io/html.py

+4-1
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@
2020
from pandas.core.construction import create_series_with_explicit_dtype
2121
from pandas.core.frame import DataFrame
2222

23-
from pandas.io.common import is_url, urlopen, validate_header_arg
23+
from pandas.io.common import is_url, stringify_path, urlopen, validate_header_arg
2424
from pandas.io.formats.printing import pprint_thing
2525
from pandas.io.parsers import TextParser
2626

@@ -1080,6 +1080,9 @@ def read_html(
10801080
"data (you passed a negative value)"
10811081
)
10821082
validate_header_arg(header)
1083+
1084+
io = stringify_path(io)
1085+
10831086
return _parse(
10841087
flavor=flavor,
10851088
io=io,

pandas/tests/io/test_html.py

+9
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,7 @@
22
from importlib import reload
33
from io import BytesIO, StringIO
44
import os
5+
from pathlib import Path
56
import re
67
import threading
78
from urllib.error import URLError
@@ -1233,3 +1234,11 @@ def run(self):
12331234
while helper_thread1.is_alive() or helper_thread2.is_alive():
12341235
pass
12351236
assert None is helper_thread1.err is helper_thread2.err
1237+
1238+
def test_parse_path_object(self, datapath):
1239+
# GH 37705
1240+
file_path_string = datapath("io", "data", "html", "spam.html")
1241+
file_path = Path(file_path_string)
1242+
df1 = self.read_html(file_path_string)[0]
1243+
df2 = self.read_html(file_path)[0]
1244+
tm.assert_frame_equal(df1, df2)

0 commit comments

Comments
 (0)