-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
DEPR: Remove literal string input for read_xml #53809
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DEPR: Remove literal string input for read_xml #53809
Conversation
@mroeschke PR is ready for review! |
doc/source/whatsnew/v2.1.0.rst
Outdated
@@ -298,13 +298,15 @@ Deprecations | |||
- Deprecated constructing :class:`SparseArray` from scalar data, pass a sequence instead (:issue:`53039`) | |||
- Deprecated falling back to filling when ``value`` is not specified in :meth:`DataFrame.replace` and :meth:`Series.replace` with non-dict-like ``to_replace`` (:issue:`33302`) | |||
- Deprecated literal json input to :func:`read_json`. Wrap literal json string input in ``io.StringIO`` instead. (:issue:`53409`) | |||
- Deprecated literal string input to :func:`read_xml`. Wrap literal string/bytes input in ``io.StringIO`` instead. (:issue:`53767`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Deprecated literal string input to :func:`read_xml`. Wrap literal string/bytes input in ``io.StringIO`` instead. (:issue:`53767`) | |
- Deprecated literal string input to :func:`read_xml`. Wrap literal string/bytes input in ``io.StringIO``/``io.BytesIO`` instead. (:issue:`53767`) |
pandas/io/xml.py
Outdated
@@ -894,6 +896,9 @@ def read_xml( | |||
string or a path. The string can further be a URL. Valid URL schemes | |||
include http, ftp, s3, and file. | |||
|
|||
.. deprecated:: 2.1.0 | |||
Passing html literal strings is deprecated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you mention wrapping in StringIO/BytesIO
as an alternate?
pandas/io/xml.py
Outdated
@@ -1119,6 +1125,15 @@ def read_xml( | |||
""" | |||
check_dtype_backend(dtype_backend) | |||
|
|||
if isinstance(path_or_buffer, str) and "\n" in path_or_buffer: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is \n
a reliable way detect if it's literal xml?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can use is_file_like
(and in your other PRs)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mroeschke Ah thanks for letting me know about is_file_like
!
Are we okay with updating the detection logic to include both is_file_like
and the check for "\n"? is_file_like
doesn't help differentiate between raw xml input and a url so that's where checking for "\n" could be helpful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can use is_url
to detect urls.
@mroeschke Pinging on green! I added multiple checks from common.py rather than just checking for |
pandas/io/xml.py
Outdated
@@ -1119,6 +1127,22 @@ def read_xml( | |||
""" | |||
check_dtype_backend(dtype_backend) | |||
|
|||
if not any( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We would want to check if the argument is a string and not any of these things.
For example, pd.read_xml(123)
should not give a deprecation warning
@mroeschke Pinging on green |
pandas/io/xml.py
Outdated
@@ -1127,7 +1127,7 @@ def read_xml( | |||
""" | |||
check_dtype_backend(dtype_backend) | |||
|
|||
if not any( | |||
if isinstance(path_or_buffer, str) and not any( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Simply for organization and not a must, can we move this warning logic to the next called method _parse
written just before this main read_xml
call? This moves it away from the docs lines. Also, in deprecated warning in docs, consider changing html
to xml
, two different types of documents.
Also, other unit tests other than your new one which reads literal strings are failing with this warning. You may have to check for Pathlike
objects as well as string. Be sure to run pytest
locally before pushing commit!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mroeschke the documentation discrepancy is fixed here. Also, unit tests are passing and CI tests are green. Pathlike objects are already being checked in my conditional statement in xml.py
Not sure how you feel about moving the warning logic to _parse
. This isn't something that we did in similar PRs. Not sure it matters?
Otherwise, I think this PR is ready for inspection.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed this warning would be better suited in _parse
as well.
@mroeschke warning logic moved to |
Thanks again @rmhowe425 |
read_excel
,read_html
, andread_xml
#53767doc/source/whatsnew/vX.X.X.rst
file if fixing a bug or adding a new feature.