Skip to content

Commit c68449a

Browse files
authored
DEPR: Remove literal string input for read_xml (#53809)
* Updating documentation and adding deprecation logic for read_xml. * Fixing documentation issue and adding unit test * Updating unit tests and documentation. * Fixing unit tests and documentation issues * Fixing unit tests and documentation issues * Fixing unit tests and documentation issues * Fixing import error in documentation * Updated deprecation logic per reviewer recommendations. * Updating deprecation logic and documentation per reviewer recommendations. * Fixing logic error * Fixing implementation per reviewer recommendations. * Updating implementation per reviewer recommendations. * Cleaning up the deprecation logic a bit. * Updating implementation per reviewer recommendations. * Updating unit tests * Fixing discrepancy in doc string. * Updating implementation based on reviewer recommendations.
1 parent e758a19 commit c68449a

File tree

6 files changed

+133
-72
lines changed

6 files changed

+133
-72
lines changed

doc/source/user_guide/io.rst

+7-6
Original file line numberDiff line numberDiff line change
@@ -2919,6 +2919,7 @@ Read an XML string:
29192919

29202920
.. ipython:: python
29212921
2922+
from io import StringIO
29222923
xml = """<?xml version="1.0" encoding="UTF-8"?>
29232924
<bookstore>
29242925
<book category="cooking">
@@ -2941,7 +2942,7 @@ Read an XML string:
29412942
</book>
29422943
</bookstore>"""
29432944
2944-
df = pd.read_xml(xml)
2945+
df = pd.read_xml(StringIO(xml))
29452946
df
29462947
29472948
Read a URL with no options:
@@ -2961,7 +2962,7 @@ as a string:
29612962
f.write(xml)
29622963
29632964
with open(file_path, "r") as f:
2964-
df = pd.read_xml(f.read())
2965+
df = pd.read_xml(StringIO(f.read()))
29652966
df
29662967
29672968
Read in the content of the "books.xml" as instance of ``StringIO`` or
@@ -3052,7 +3053,7 @@ For example, below XML contains a namespace with prefix, ``doc``, and URI at
30523053
</doc:row>
30533054
</doc:data>"""
30543055
3055-
df = pd.read_xml(xml,
3056+
df = pd.read_xml(StringIO(xml),
30563057
xpath="//doc:row",
30573058
namespaces={"doc": "https://example.com"})
30583059
df
@@ -3082,7 +3083,7 @@ But assigning *any* temporary name to correct URI allows parsing by nodes.
30823083
</row>
30833084
</data>"""
30843085
3085-
df = pd.read_xml(xml,
3086+
df = pd.read_xml(StringIO(xml),
30863087
xpath="//pandas:row",
30873088
namespaces={"pandas": "https://example.com"})
30883089
df
@@ -3117,7 +3118,7 @@ However, if XPath does not reference node names such as default, ``/*``, then
31173118
</row>
31183119
</data>"""
31193120
3120-
df = pd.read_xml(xml, xpath="./row")
3121+
df = pd.read_xml(StringIO(xml), xpath="./row")
31213122
df
31223123
31233124
shows the attribute ``sides`` on ``shape`` element was not parsed as
@@ -3218,7 +3219,7 @@ output (as shown below for demonstration) for easier parse into ``DataFrame``:
32183219
</row>
32193220
</response>"""
32203221
3221-
df = pd.read_xml(xml, stylesheet=xsl)
3222+
df = pd.read_xml(StringIO(xml), stylesheet=xsl)
32223223
df
32233224
32243225
For very large XML files that can range in hundreds of megabytes to gigabytes, :func:`pandas.read_xml`

doc/source/whatsnew/v1.5.0.rst

+2-1
Original file line numberDiff line numberDiff line change
@@ -221,6 +221,7 @@ apply converter methods, and parse dates (:issue:`43567`).
221221

222222
.. ipython:: python
223223
224+
from io import StringIO
224225
xml_dates = """<?xml version='1.0' encoding='utf-8'?>
225226
<data>
226227
<row>
@@ -244,7 +245,7 @@ apply converter methods, and parse dates (:issue:`43567`).
244245
</data>"""
245246
246247
df = pd.read_xml(
247-
xml_dates,
248+
StringIO(xml_dates),
248249
dtype={'sides': 'Int64'},
249250
converters={'degrees': str},
250251
parse_dates=['date']

doc/source/whatsnew/v2.1.0.rst

+1
Original file line numberDiff line numberDiff line change
@@ -313,6 +313,7 @@ Deprecations
313313
- Deprecated constructing :class:`SparseArray` from scalar data, pass a sequence instead (:issue:`53039`)
314314
- Deprecated falling back to filling when ``value`` is not specified in :meth:`DataFrame.replace` and :meth:`Series.replace` with non-dict-like ``to_replace`` (:issue:`33302`)
315315
- Deprecated literal json input to :func:`read_json`. Wrap literal json string input in ``io.StringIO`` instead. (:issue:`53409`)
316+
- Deprecated literal string input to :func:`read_xml`. Wrap literal string/bytes input in ``io.StringIO`` / ``io.BytesIO`` instead. (:issue:`53767`)
316317
- Deprecated literal string/bytes input to :func:`read_html`. Wrap literal string/bytes input in ``io.StringIO`` / ``io.BytesIO`` instead. (:issue:`53767`)
317318
- Deprecated option "mode.use_inf_as_na", convert inf entries to ``NaN`` before instead (:issue:`51684`)
318319
- Deprecated parameter ``obj`` in :meth:`GroupBy.get_group` (:issue:`53545`)

pandas/io/xml.py

+27-3
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,7 @@
1111
Any,
1212
Callable,
1313
)
14+
import warnings
1415

1516
from pandas._libs import lib
1617
from pandas.compat._optional import import_optional_dependency
@@ -19,6 +20,7 @@
1920
ParserError,
2021
)
2122
from pandas.util._decorators import doc
23+
from pandas.util._exceptions import find_stack_level
2224
from pandas.util._validators import check_dtype_backend
2325

2426
from pandas.core.dtypes.common import is_list_like
@@ -29,6 +31,7 @@
2931
file_exists,
3032
get_handle,
3133
infer_compression,
34+
is_file_like,
3235
is_fsspec_url,
3336
is_url,
3437
stringify_path,
@@ -802,6 +805,22 @@ def _parse(
802805

803806
p: _EtreeFrameParser | _LxmlFrameParser
804807

808+
if isinstance(path_or_buffer, str) and not any(
809+
[
810+
is_file_like(path_or_buffer),
811+
file_exists(path_or_buffer),
812+
is_url(path_or_buffer),
813+
is_fsspec_url(path_or_buffer),
814+
]
815+
):
816+
warnings.warn(
817+
"Passing literal xml to 'read_xml' is deprecated and "
818+
"will be removed in a future version. To read from a "
819+
"literal string, wrap it in a 'StringIO' object.",
820+
FutureWarning,
821+
stacklevel=find_stack_level(),
822+
)
823+
805824
if parser == "lxml":
806825
lxml = import_optional_dependency("lxml.etree", errors="ignore")
807826

@@ -894,6 +913,10 @@ def read_xml(
894913
string or a path. The string can further be a URL. Valid URL schemes
895914
include http, ftp, s3, and file.
896915
916+
.. deprecated:: 2.1.0
917+
Passing xml literal strings is deprecated.
918+
Wrap literal xml input in ``io.StringIO`` or ``io.BytesIO`` instead.
919+
897920
xpath : str, optional, default './\*'
898921
The XPath to parse required set of nodes for migration to DataFrame.
899922
XPath should return a collection of elements and not a single
@@ -1049,6 +1072,7 @@ def read_xml(
10491072
10501073
Examples
10511074
--------
1075+
>>> import io
10521076
>>> xml = '''<?xml version='1.0' encoding='utf-8'?>
10531077
... <data xmlns="http://example.com">
10541078
... <row>
@@ -1068,7 +1092,7 @@ def read_xml(
10681092
... </row>
10691093
... </data>'''
10701094
1071-
>>> df = pd.read_xml(xml)
1095+
>>> df = pd.read_xml(io.StringIO(xml))
10721096
>>> df
10731097
shape degrees sides
10741098
0 square 360 4.0
@@ -1082,7 +1106,7 @@ def read_xml(
10821106
... <row shape="triangle" degrees="180" sides="3.0"/>
10831107
... </data>'''
10841108
1085-
>>> df = pd.read_xml(xml, xpath=".//row")
1109+
>>> df = pd.read_xml(io.StringIO(xml), xpath=".//row")
10861110
>>> df
10871111
shape degrees sides
10881112
0 square 360 4.0
@@ -1108,7 +1132,7 @@ def read_xml(
11081132
... </doc:row>
11091133
... </doc:data>'''
11101134
1111-
>>> df = pd.read_xml(xml,
1135+
>>> df = pd.read_xml(io.StringIO(xml),
11121136
... xpath="//doc:row",
11131137
... namespaces={{"doc": "https://example.com"}})
11141138
>>> df

0 commit comments

Comments
 (0)