Skip to content

Commit 1c06cb8

Browse files
rmhowe425im-vinicius
authored and
im-vinicius
committed
DEPR: Remove literal string input for read_html (pandas-dev#53805)
* Updating documentation and adding deprecation logic for read_html. * Fixing formatting errors * Fixing documentation errors * Updating deprecation logic and documentation per reviewer recommendations. * Updating implementation per reviewer recommendations.
1 parent 65900fc commit 1c06cb8

File tree

5 files changed

+150
-55
lines changed

5 files changed

+150
-55
lines changed

doc/source/user_guide/io.rst

+1-1
Original file line numberDiff line numberDiff line change
@@ -2664,7 +2664,7 @@ Links can be extracted from cells along with the text using ``extract_links="all
26642664
"""
26652665
26662666
df = pd.read_html(
2667-
html_table,
2667+
StringIO(html_table),
26682668
extract_links="all"
26692669
)[0]
26702670
df

doc/source/whatsnew/v0.24.0.rst

+2-2
Original file line numberDiff line numberDiff line change
@@ -286,7 +286,7 @@ value. (:issue:`17054`)
286286

287287
.. ipython:: python
288288
289-
result = pd.read_html("""
289+
result = pd.read_html(StringIO("""
290290
<table>
291291
<thead>
292292
<tr>
@@ -298,7 +298,7 @@ value. (:issue:`17054`)
298298
<td colspan="2">1</td><td>2</td>
299299
</tr>
300300
</tbody>
301-
</table>""")
301+
</table>"""))
302302
303303
*Previous behavior*:
304304

doc/source/whatsnew/v2.1.0.rst

+1
Original file line numberDiff line numberDiff line change
@@ -298,6 +298,7 @@ Deprecations
298298
- Deprecated constructing :class:`SparseArray` from scalar data, pass a sequence instead (:issue:`53039`)
299299
- Deprecated falling back to filling when ``value`` is not specified in :meth:`DataFrame.replace` and :meth:`Series.replace` with non-dict-like ``to_replace`` (:issue:`33302`)
300300
- Deprecated literal json input to :func:`read_json`. Wrap literal json string input in ``io.StringIO`` instead. (:issue:`53409`)
301+
- Deprecated literal string/bytes input to :func:`read_html`. Wrap literal string/bytes input in ``io.StringIO`` / ``io.BytesIO`` instead. (:issue:`53767`)
301302
- Deprecated option "mode.use_inf_as_na", convert inf entries to ``NaN`` before instead (:issue:`51684`)
302303
- Deprecated parameter ``obj`` in :meth:`GroupBy.get_group` (:issue:`53545`)
303304
- Deprecated positional indexing on :class:`Series` with :meth:`Series.__getitem__` and :meth:`Series.__setitem__`, in a future version ``ser[item]`` will *always* interpret ``item`` as a label, not a position (:issue:`50617`)

pandas/io/html.py

+24
Original file line numberDiff line numberDiff line change
@@ -17,13 +17,15 @@
1717
Sequence,
1818
cast,
1919
)
20+
import warnings
2021

2122
from pandas._libs import lib
2223
from pandas.compat._optional import import_optional_dependency
2324
from pandas.errors import (
2425
AbstractMethodError,
2526
EmptyDataError,
2627
)
28+
from pandas.util._exceptions import find_stack_level
2729
from pandas.util._validators import check_dtype_backend
2830

2931
from pandas.core.dtypes.common import is_list_like
@@ -36,6 +38,8 @@
3638
from pandas.io.common import (
3739
file_exists,
3840
get_handle,
41+
is_file_like,
42+
is_fsspec_url,
3943
is_url,
4044
stringify_path,
4145
urlopen,
@@ -1023,6 +1027,10 @@ def read_html(
10231027
lxml only accepts the http, ftp and file url protocols. If you have a
10241028
URL that starts with ``'https'`` you might try removing the ``'s'``.
10251029
1030+
.. deprecated:: 2.1.0
1031+
Passing html literal strings is deprecated.
1032+
Wrap literal string/bytes input in ``io.StringIO``/``io.BytesIO`` instead.
1033+
10261034
match : str or compiled regular expression, optional
10271035
The set of tables containing text matching this regex or string will be
10281036
returned. Unless the HTML is extremely simple you will probably need to
@@ -1178,6 +1186,22 @@ def read_html(
11781186

11791187
io = stringify_path(io)
11801188

1189+
if isinstance(io, str) and not any(
1190+
[
1191+
is_file_like(io),
1192+
file_exists(io),
1193+
is_url(io),
1194+
is_fsspec_url(io),
1195+
]
1196+
):
1197+
warnings.warn(
1198+
"Passing literal html to 'read_html' is deprecated and "
1199+
"will be removed in a future version. To read from a "
1200+
"literal string, wrap it in a 'StringIO' object.",
1201+
FutureWarning,
1202+
stacklevel=find_stack_level(),
1203+
)
1204+
11811205
return _parse(
11821206
flavor=flavor,
11831207
io=io,

0 commit comments

Comments
 (0)