-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
TYP: type read_xml and deprecate passing positional arguments #45133
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
phofl
commented
Dec 30, 2021
- closes #xxxx
- tests added / passed
- Ensure all linting tests pass, see here for how to run them
- whatsnew entry
elems_only: bool = False, | ||
attrs_only: bool = False, | ||
names: Sequence[str] | None = None, | ||
# encoding can not be None for lxml and StringIO input | ||
encoding: str | None = "utf-8", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can type this correctly after enforcing the positional deprecation
pandas/io/xml.py
Outdated
attrs_only: bool | None = False, | ||
names: list[str] | None = None, | ||
xpath: str = "./*", | ||
namespaces: dict | None = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
list[dict] does not work, this raises
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As mentioned below, namespaces are distinct prefix/URI pairs within an XML doc, so should only be dict
types. LMK of an XML use case if otherwise. While namespace prefixes can be reused at different elements per this SO answer, XPath prefix-URI (i.e., key/value) mapping must be unique.
pandas/io/xml.py
Outdated
stylesheet, | ||
path_or_buffer: FilePath | ReadBuffer[bytes] | ReadBuffer[str], | ||
xpath: str, | ||
namespaces: dict | None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not familiar with how namespaces
is used, is it dict[str, str] | None
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not familiar enough either, found only cases like this, but not 100% sure
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
namespaces is passed to findall
which according to typeshed expects a dict[str, str] | None
:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you also please fix this existing typo in the doc-string (two "s" at the end):
Line 57 in 3accd12
namespacess : dict |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Namespaces map string prefixes to URIs so can only be dict[str, str]
. Neither prefix or URI can be non-string types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added
cc @ParfaitG if any comments |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, @phofl, for keeping IO XML to current typing standards! I have a few comments in my walk-through.
@@ -246,6 +246,7 @@ def closed(self) -> bool: | |||
CompressionOptions = Optional[ | |||
Union[Literal["infer", "gzip", "bz2", "zip", "xz", "zstd"], CompressionDict] | |||
] | |||
XMLParsers = Literal["lxml", "etree"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice addition for this type. Maybe inspired by new IO XML module.
pandas/io/xml.py
Outdated
stylesheet, | ||
path_or_buffer: FilePath | ReadBuffer[bytes] | ReadBuffer[str], | ||
xpath: str, | ||
namespaces: dict | list[dict] | None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I cannot see how list of dicts is used for this argument. Each XML doc will have a unique set of namespaces (i.e., prefix to URI pairs). Does mypy raise? Consider dict[str, str]
though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could not find them on my newest commit, did you maybe review an older one? I have moved the types from the public api first and checked them for validity afterwards. I realised that list[dict] did not work and removed them again then
pandas/io/xml.py
Outdated
@@ -765,7 +764,7 @@ def read_xml( | |||
expressions. For more complex XPath, use ``lxml`` which requires | |||
installation. | |||
|
|||
namespaces : dict, optional | |||
namespaces : dict, list of dicts, optional |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again namespaces will not be a list of dicts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep removed, forgot this
@@ -782,6 +782,19 @@ def test_read_xml_passing_as_positional_deprecated(datapath): | |||
) | |||
|
|||
|
|||
@td.skip_if_no("lxml") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good test for None
encoding. Can this new test be moved few lines earlier under previous ENCODING section (lines ~700-730). I wonder if we should also test for etree
with None
expecting success with tm.assert_frame_equal
. etree tends to be more active in API so may change defaults with future versions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added and moved
LGTM! Thanks, @phofl! |
cc @jreback |
can you rebase (as merged other one). or is this orthogonal? |
This should be orthogonal, but better to rebase |
thanks @phofl |