TYP: type read_xml and deprecate passing positional arguments #45133

phofl · 2021-12-30T21:07:01Z

closes #xxxx
tests added / passed
Ensure all linting tests pass, see here for how to run them
whatsnew entry

phofl · 2021-12-30T21:07:57Z

pandas/io/xml.py

+    elems_only: bool = False,
+    attrs_only: bool = False,
+    names: Sequence[str] | None = None,
+    # encoding can not be None for lxml and StringIO input
    encoding: str | None = "utf-8",


Can type this correctly after enforcing the positional deprecation

phofl · 2021-12-30T21:08:11Z

pandas/io/xml.py

-    attrs_only: bool | None = False,
-    names: list[str] | None = None,
+    xpath: str = "./*",
+    namespaces: dict | None = None,


list[dict] does not work, this raises

As mentioned below, namespaces are distinct prefix/URI pairs within an XML doc, so should only be dict types. LMK of an XML use case if otherwise. While namespace prefixes can be reused at different elements per this SO answer, XPath prefix-URI (i.e., key/value) mapping must be unique.

pandas/io/xml.py

twoertwein · 2021-12-30T21:30:26Z

pandas/io/xml.py

-        stylesheet,
+        path_or_buffer: FilePath | ReadBuffer[bytes] | ReadBuffer[str],
+        xpath: str,
+        namespaces: dict | None,


I'm not familiar with how namespaces is used, is it dict[str, str] | None?

Not familiar enough either, found only cases like this, but not 100% sure

namespaces is passed to findall which according to typeshed expects a dict[str, str] | None:

https://github.com/python/typeshed/blob/e434b23741a5e3f2ea899ccfb0ef2a15f168ebf1/stdlib/xml/etree/ElementTree.pyi#L72

Can you also please fix this existing typo in the doc-string (two "s" at the end):

pandas/pandas/io/xml.py

Line 57 in 3accd12

namespacess : dict

Namespaces map string prefixes to URIs so can only be dict[str, str]. Neither prefix or URI can be non-string types.

pandas/io/xml.py

jreback · 2021-12-31T15:58:18Z

cc @ParfaitG if any comments

ParfaitG

Thanks, @phofl, for keeping IO XML to current typing standards! I have a few comments in my walk-through.

ParfaitG · 2021-12-31T16:09:31Z

pandas/_typing.py

@@ -246,6 +246,7 @@ def closed(self) -> bool:
 CompressionOptions = Optional[
    Union[Literal["infer", "gzip", "bz2", "zip", "xz", "zstd"], CompressionDict]
 ]
+XMLParsers = Literal["lxml", "etree"]


Nice addition for this type. Maybe inspired by new IO XML module.

ParfaitG · 2021-12-31T16:12:56Z

pandas/io/xml.py

-        stylesheet,
+        path_or_buffer: FilePath | ReadBuffer[bytes] | ReadBuffer[str],
+        xpath: str,
+        namespaces: dict | list[dict] | None,


I cannot see how list of dicts is used for this argument. Each XML doc will have a unique set of namespaces (i.e., prefix to URI pairs). Does mypy raise? Consider dict[str, str] though.

I could not find them on my newest commit, did you maybe review an older one? I have moved the types from the public api first and checked them for validity afterwards. I realised that list[dict] did not work and removed them again then

ParfaitG · 2021-12-31T16:15:10Z

pandas/io/xml.py

@@ -765,7 +764,7 @@ def read_xml(
        expressions. For more complex XPath, use ``lxml`` which requires
        installation.

-    namespaces : dict, optional
+    namespaces : dict, list of dicts, optional


Again namespaces will not be a list of dicts.

Yep removed, forgot this

ParfaitG · 2021-12-31T16:28:03Z

pandas/tests/io/xml/test_xml.py

@@ -782,6 +782,19 @@ def test_read_xml_passing_as_positional_deprecated(datapath):
        )


+@td.skip_if_no("lxml")


Good test for None encoding. Can this new test be moved few lines earlier under previous ENCODING section (lines ~700-730). I wonder if we should also test for etree with None expecting success with tm.assert_frame_equal. etree tends to be more active in API so may change defaults with future versions.

Added and moved

ParfaitG · 2021-12-31T23:03:40Z

LGTM! Thanks, @phofl!

phofl · 2022-01-01T00:06:17Z

cc @jreback

jreback · 2022-01-01T01:53:53Z

can you rebase (as merged other one). or is this orthogonal?

phofl · 2022-01-01T11:26:36Z

This should be orthogonal, but better to rebase

jreback · 2022-01-04T00:26:06Z

thanks @phofl

phofl added 3 commits December 30, 2021 21:05

TYP: Type read_xml and adjust type hints

baa3ab2

Fix typing issues

ab8f1c6

Deprecate positional

9ff4878

phofl commented Dec 30, 2021

View reviewed changes

Adjust comment

6998251

twoertwein reviewed Dec 30, 2021

View reviewed changes

pandas/io/xml.py Show resolved Hide resolved

twoertwein reviewed Dec 30, 2021

View reviewed changes

pandas/io/xml.py Show resolved Hide resolved

twoertwein added the Typing type annotations, mypy/pyright type checking label Dec 30, 2021

phofl added 2 commits December 30, 2021 22:42

Fix test

0b85c7d

Add test

3accd12

jreback added the IO XML read_xml, to_xml label Dec 31, 2021

ParfaitG reviewed Dec 31, 2021

View reviewed changes

Add test and adjust type hint

1da9569

twoertwein approved these changes Dec 31, 2021

View reviewed changes

jreback added this to the 1.4 milestone Jan 1, 2022

phofl added 2 commits January 1, 2022 12:22

Merge remote-tracking branch 'upstream/master' into typ_xml

a2348a2

Adjust dep test

ba355ba

jreback merged commit a2aa477 into pandas-dev:master Jan 4, 2022

phofl deleted the typ_xml branch January 4, 2022 08:51

phofl mentioned this pull request Jan 4, 2022

DEPR: log of deprecations in 1.x (to be removed in 2.0) #30228

Closed

phofl mentioned this pull request Sep 17, 2022

Remove deprecated arguments and functions pandas-dev/pandas-stubs#307

Merged

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TYP: type read_xml and deprecate passing positional arguments #45133

TYP: type read_xml and deprecate passing positional arguments #45133

phofl commented Dec 30, 2021

phofl Dec 30, 2021

phofl Dec 30, 2021

ParfaitG Dec 31, 2021 •

edited

Loading

twoertwein Dec 30, 2021

phofl Dec 30, 2021

twoertwein Dec 31, 2021

twoertwein Dec 31, 2021

ParfaitG Dec 31, 2021

phofl Dec 31, 2021

jreback commented Dec 31, 2021

ParfaitG left a comment

ParfaitG Dec 31, 2021

ParfaitG Dec 31, 2021

phofl Dec 31, 2021

ParfaitG Dec 31, 2021

phofl Dec 31, 2021

ParfaitG Dec 31, 2021

phofl Dec 31, 2021

ParfaitG commented Dec 31, 2021

phofl commented Jan 1, 2022

jreback commented Jan 1, 2022

phofl commented Jan 1, 2022

jreback commented Jan 4, 2022

		@@ -782,6 +782,19 @@ def test_read_xml_passing_as_positional_deprecated(datapath):
		)


		@td.skip_if_no("lxml")

TYP: type read_xml and deprecate passing positional arguments #45133

TYP: type read_xml and deprecate passing positional arguments #45133

Conversation

phofl commented Dec 30, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ParfaitG Dec 31, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback commented Dec 31, 2021

ParfaitG left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ParfaitG commented Dec 31, 2021

phofl commented Jan 1, 2022

jreback commented Jan 1, 2022

phofl commented Jan 1, 2022

jreback commented Jan 4, 2022

ParfaitG Dec 31, 2021 •

edited

Loading