Skip to content

DEPR: Remove literal string input for read_html #53805

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Jun 28, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion doc/source/user_guide/io.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2664,7 +2664,7 @@ Links can be extracted from cells along with the text using ``extract_links="all
"""

df = pd.read_html(
html_table,
StringIO(html_table),
extract_links="all"
)[0]
df
Expand Down
4 changes: 2 additions & 2 deletions doc/source/whatsnew/v0.24.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -286,7 +286,7 @@ value. (:issue:`17054`)

.. ipython:: python

result = pd.read_html("""
result = pd.read_html(StringIO("""
<table>
<thead>
<tr>
Expand All @@ -298,7 +298,7 @@ value. (:issue:`17054`)
<td colspan="2">1</td><td>2</td>
</tr>
</tbody>
</table>""")
</table>"""))

*Previous behavior*:

Expand Down
2 changes: 2 additions & 0 deletions doc/source/whatsnew/v2.1.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -298,13 +298,15 @@ Deprecations
- Deprecated constructing :class:`SparseArray` from scalar data, pass a sequence instead (:issue:`53039`)
- Deprecated falling back to filling when ``value`` is not specified in :meth:`DataFrame.replace` and :meth:`Series.replace` with non-dict-like ``to_replace`` (:issue:`33302`)
- Deprecated literal json input to :func:`read_json`. Wrap literal json string input in ``io.StringIO`` instead. (:issue:`53409`)
- Deprecated literal string/bytes input to :func:`read_html`. Wrap literal string/bytes input in ``io.StringIO`` / ``io.BytesIO`` instead. (:issue:`53767`)
- Deprecated option "mode.use_inf_as_na", convert inf entries to ``NaN`` before instead (:issue:`51684`)
- Deprecated parameter ``obj`` in :meth:`GroupBy.get_group` (:issue:`53545`)
- Deprecated positional indexing on :class:`Series` with :meth:`Series.__getitem__` and :meth:`Series.__setitem__`, in a future version ``ser[item]`` will *always* interpret ``item`` as a label, not a position (:issue:`50617`)
- Deprecated strings ``T``, ``t``, ``L`` and ``l`` denoting units in :func:`to_timedelta` (:issue:`52536`)
- Deprecated the "method" and "limit" keywords on :meth:`Series.fillna`, :meth:`DataFrame.fillna`, :meth:`SeriesGroupBy.fillna`, :meth:`DataFrameGroupBy.fillna`, and :meth:`Resampler.fillna`, use ``obj.bfill()`` or ``obj.ffill()`` instead (:issue:`53394`)
- Deprecated the ``method`` and ``limit`` keywords in :meth:`DataFrame.replace` and :meth:`Series.replace` (:issue:`33302`)
- Deprecated values "pad", "ffill", "bfill", "backfill" for :meth:`Series.interpolate` and :meth:`DataFrame.interpolate`, use ``obj.ffill()`` or ``obj.bfill()`` instead (:issue:`53581`)
-

.. ---------------------------------------------------------------------------
.. _whatsnew_210.performance:
Expand Down
24 changes: 24 additions & 0 deletions pandas/io/html.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,13 +17,15 @@
Sequence,
cast,
)
import warnings

from pandas._libs import lib
from pandas.compat._optional import import_optional_dependency
from pandas.errors import (
AbstractMethodError,
EmptyDataError,
)
from pandas.util._exceptions import find_stack_level
from pandas.util._validators import check_dtype_backend

from pandas.core.dtypes.common import is_list_like
Expand All @@ -36,6 +38,8 @@
from pandas.io.common import (
file_exists,
get_handle,
is_file_like,
is_fsspec_url,
is_url,
stringify_path,
urlopen,
Expand Down Expand Up @@ -1023,6 +1027,10 @@ def read_html(
lxml only accepts the http, ftp and file url protocols. If you have a
URL that starts with ``'https'`` you might try removing the ``'s'``.

.. deprecated:: 2.1.0
Passing html literal strings is deprecated.
Wrap literal string/bytes input in ``io.StringIO``/``io.BytesIO`` instead.

match : str or compiled regular expression, optional
The set of tables containing text matching this regex or string will be
returned. Unless the HTML is extremely simple you will probably need to
Expand Down Expand Up @@ -1178,6 +1186,22 @@ def read_html(

io = stringify_path(io)

if isinstance(io, str) and not any(
[
is_file_like(io),
file_exists(io),
is_url(io),
is_fsspec_url(io),
]
):
warnings.warn(
"Passing literal html to 'read_html' is deprecated and "
"will be removed in a future version. To read from a "
"literal string, wrap it in a 'StringIO' object.",
FutureWarning,
stacklevel=find_stack_level(),
)

return _parse(
flavor=flavor,
io=io,
Expand Down
Loading