Skip to content

DEPR: Remove literal string input for read_xml #53809

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 23 commits into from
Jul 11, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
23 commits
Select commit Hold shift + click to select a range
f347e8e
Updating documentation and adding deprecation logic for read_xml.
rmhowe425 Jun 22, 2023
296b45a
Fixing documentation issue and adding unit test
rmhowe425 Jun 23, 2023
69cdc1a
Updating unit tests and documentation.
rmhowe425 Jun 23, 2023
83a9177
Merge branch 'main' into dev/depr/literal-str-read_xml
rmhowe425 Jun 23, 2023
0f0f38b
Fixing unit tests and documentation issues
rmhowe425 Jun 24, 2023
2c848ac
Fixing unit tests and documentation issues
rmhowe425 Jun 24, 2023
b8a582c
Fixing unit tests and documentation issues
rmhowe425 Jun 24, 2023
92bc6fa
Fixing import error in documentation
rmhowe425 Jun 24, 2023
8bbd7c4
Updated deprecation logic per reviewer recommendations.
rmhowe425 Jun 26, 2023
5aece78
Updating deprecation logic and documentation per reviewer recommendat…
rmhowe425 Jun 26, 2023
6f15924
Fixing logic error
rmhowe425 Jun 26, 2023
00f7b15
Fixing implementation per reviewer recommendations.
rmhowe425 Jun 27, 2023
20e7ef2
Updating implementation per reviewer recommendations.
rmhowe425 Jun 27, 2023
526c224
Cleaning up the deprecation logic a bit.
rmhowe425 Jun 27, 2023
9dfa18d
Merge branch 'main' into dev/depr/literal-str-read_xml
rmhowe425 Jun 27, 2023
65f88b9
Updating implementation per reviewer recommendations.
rmhowe425 Jun 27, 2023
ec28efa
Merge branch 'main' into dev/depr/literal-str-read_xml
rmhowe425 Jun 28, 2023
2c58638
Merge branch 'main' into dev/depr/literal-str-read_xml
rmhowe425 Jun 29, 2023
e08f4e0
Merge branch 'main' into dev/depr/literal-str-read_xml
rmhowe425 Jun 30, 2023
ba1edd6
Merge branch 'main' into dev/depr/literal-str-read_xml
rmhowe425 Jul 9, 2023
b7e1fb6
Updating unit tests
rmhowe425 Jul 9, 2023
14d2cb1
Fixing discrepancy in doc string.
rmhowe425 Jul 9, 2023
c215a94
Updating implementation based on reviewer recommendations.
rmhowe425 Jul 11, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 7 additions & 6 deletions doc/source/user_guide/io.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2919,6 +2919,7 @@ Read an XML string:

.. ipython:: python

from io import StringIO
xml = """<?xml version="1.0" encoding="UTF-8"?>
<bookstore>
<book category="cooking">
Expand All @@ -2941,7 +2942,7 @@ Read an XML string:
</book>
</bookstore>"""

df = pd.read_xml(xml)
df = pd.read_xml(StringIO(xml))
df

Read a URL with no options:
Expand All @@ -2961,7 +2962,7 @@ as a string:
f.write(xml)

with open(file_path, "r") as f:
df = pd.read_xml(f.read())
df = pd.read_xml(StringIO(f.read()))
df

Read in the content of the "books.xml" as instance of ``StringIO`` or
Expand Down Expand Up @@ -3052,7 +3053,7 @@ For example, below XML contains a namespace with prefix, ``doc``, and URI at
</doc:row>
</doc:data>"""

df = pd.read_xml(xml,
df = pd.read_xml(StringIO(xml),
xpath="//doc:row",
namespaces={"doc": "https://example.com"})
df
Expand Down Expand Up @@ -3082,7 +3083,7 @@ But assigning *any* temporary name to correct URI allows parsing by nodes.
</row>
</data>"""

df = pd.read_xml(xml,
df = pd.read_xml(StringIO(xml),
xpath="//pandas:row",
namespaces={"pandas": "https://example.com"})
df
Expand Down Expand Up @@ -3117,7 +3118,7 @@ However, if XPath does not reference node names such as default, ``/*``, then
</row>
</data>"""

df = pd.read_xml(xml, xpath="./row")
df = pd.read_xml(StringIO(xml), xpath="./row")
df

shows the attribute ``sides`` on ``shape`` element was not parsed as
Expand Down Expand Up @@ -3218,7 +3219,7 @@ output (as shown below for demonstration) for easier parse into ``DataFrame``:
</row>
</response>"""

df = pd.read_xml(xml, stylesheet=xsl)
df = pd.read_xml(StringIO(xml), stylesheet=xsl)
df

For very large XML files that can range in hundreds of megabytes to gigabytes, :func:`pandas.read_xml`
Expand Down
3 changes: 2 additions & 1 deletion doc/source/whatsnew/v1.5.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -221,6 +221,7 @@ apply converter methods, and parse dates (:issue:`43567`).

.. ipython:: python

from io import StringIO
xml_dates = """<?xml version='1.0' encoding='utf-8'?>
<data>
<row>
Expand All @@ -244,7 +245,7 @@ apply converter methods, and parse dates (:issue:`43567`).
</data>"""

df = pd.read_xml(
xml_dates,
StringIO(xml_dates),
dtype={'sides': 'Int64'},
converters={'degrees': str},
parse_dates=['date']
Expand Down
1 change: 1 addition & 0 deletions doc/source/whatsnew/v2.1.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -311,6 +311,7 @@ Deprecations
- Deprecated constructing :class:`SparseArray` from scalar data, pass a sequence instead (:issue:`53039`)
- Deprecated falling back to filling when ``value`` is not specified in :meth:`DataFrame.replace` and :meth:`Series.replace` with non-dict-like ``to_replace`` (:issue:`33302`)
- Deprecated literal json input to :func:`read_json`. Wrap literal json string input in ``io.StringIO`` instead. (:issue:`53409`)
- Deprecated literal string input to :func:`read_xml`. Wrap literal string/bytes input in ``io.StringIO`` / ``io.BytesIO`` instead. (:issue:`53767`)
- Deprecated literal string/bytes input to :func:`read_html`. Wrap literal string/bytes input in ``io.StringIO`` / ``io.BytesIO`` instead. (:issue:`53767`)
- Deprecated option "mode.use_inf_as_na", convert inf entries to ``NaN`` before instead (:issue:`51684`)
- Deprecated parameter ``obj`` in :meth:`GroupBy.get_group` (:issue:`53545`)
Expand Down
30 changes: 27 additions & 3 deletions pandas/io/xml.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@
Callable,
Sequence,
)
import warnings

from pandas._libs import lib
from pandas.compat._optional import import_optional_dependency
Expand All @@ -20,6 +21,7 @@
ParserError,
)
from pandas.util._decorators import doc
from pandas.util._exceptions import find_stack_level
from pandas.util._validators import check_dtype_backend

from pandas.core.dtypes.common import is_list_like
Expand All @@ -30,6 +32,7 @@
file_exists,
get_handle,
infer_compression,
is_file_like,
is_fsspec_url,
is_url,
stringify_path,
Expand Down Expand Up @@ -802,6 +805,22 @@ def _parse(

p: _EtreeFrameParser | _LxmlFrameParser

if isinstance(path_or_buffer, str) and not any(
[
is_file_like(path_or_buffer),
file_exists(path_or_buffer),
is_url(path_or_buffer),
is_fsspec_url(path_or_buffer),
]
):
warnings.warn(
"Passing literal xml to 'read_xml' is deprecated and "
"will be removed in a future version. To read from a "
"literal string, wrap it in a 'StringIO' object.",
FutureWarning,
stacklevel=find_stack_level(),
)

if parser == "lxml":
lxml = import_optional_dependency("lxml.etree", errors="ignore")

Expand Down Expand Up @@ -894,6 +913,10 @@ def read_xml(
string or a path. The string can further be a URL. Valid URL schemes
include http, ftp, s3, and file.

.. deprecated:: 2.1.0
Passing xml literal strings is deprecated.
Wrap literal xml input in ``io.StringIO`` or ``io.BytesIO`` instead.

xpath : str, optional, default './\*'
The XPath to parse required set of nodes for migration to DataFrame.
XPath should return a collection of elements and not a single
Expand Down Expand Up @@ -1049,6 +1072,7 @@ def read_xml(

Examples
--------
>>> import io
>>> xml = '''<?xml version='1.0' encoding='utf-8'?>
... <data xmlns="http://example.com">
... <row>
Expand All @@ -1068,7 +1092,7 @@ def read_xml(
... </row>
... </data>'''

>>> df = pd.read_xml(xml)
>>> df = pd.read_xml(io.StringIO(xml))
>>> df
shape degrees sides
0 square 360 4.0
Expand All @@ -1082,7 +1106,7 @@ def read_xml(
... <row shape="triangle" degrees="180" sides="3.0"/>
... </data>'''

>>> df = pd.read_xml(xml, xpath=".//row")
>>> df = pd.read_xml(io.StringIO(xml), xpath=".//row")
>>> df
shape degrees sides
0 square 360 4.0
Expand All @@ -1108,7 +1132,7 @@ def read_xml(
... </doc:row>
... </doc:data>'''

>>> df = pd.read_xml(xml,
>>> df = pd.read_xml(io.StringIO(xml),
... xpath="//doc:row",
... namespaces={{"doc": "https://example.com"}})
>>> df
Expand Down
Loading