Skip to content

TYP: type read_xml and deprecate passing positional arguments #45133

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Jan 4, 2022

Conversation

phofl
Copy link
Member

@phofl phofl commented Dec 30, 2021

  • closes #xxxx
  • tests added / passed
  • Ensure all linting tests pass, see here for how to run them
  • whatsnew entry

elems_only: bool = False,
attrs_only: bool = False,
names: Sequence[str] | None = None,
# encoding can not be None for lxml and StringIO input
encoding: str | None = "utf-8",
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can type this correctly after enforcing the positional deprecation

pandas/io/xml.py Outdated
attrs_only: bool | None = False,
names: list[str] | None = None,
xpath: str = "./*",
namespaces: dict | None = None,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

list[dict] does not work, this raises

Copy link
Contributor

@ParfaitG ParfaitG Dec 31, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned below, namespaces are distinct prefix/URI pairs within an XML doc, so should only be dict types. LMK of an XML use case if otherwise. While namespace prefixes can be reused at different elements per this SO answer, XPath prefix-URI (i.e., key/value) mapping must be unique.

pandas/io/xml.py Outdated
stylesheet,
path_or_buffer: FilePath | ReadBuffer[bytes] | ReadBuffer[str],
xpath: str,
namespaces: dict | None,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not familiar with how namespaces is used, is it dict[str, str] | None?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not familiar enough either, found only cases like this, but not 100% sure

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

namespaces is passed to findall which according to typeshed expects a dict[str, str] | None:

https://github.com/python/typeshed/blob/e434b23741a5e3f2ea899ccfb0ef2a15f168ebf1/stdlib/xml/etree/ElementTree.pyi#L72

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also please fix this existing typo in the doc-string (two "s" at the end):

namespacess : dict

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Namespaces map string prefixes to URIs so can only be dict[str, str]. Neither prefix or URI can be non-string types.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added

@twoertwein twoertwein added the Typing type annotations, mypy/pyright type checking label Dec 30, 2021
@jreback
Copy link
Contributor

jreback commented Dec 31, 2021

cc @ParfaitG if any comments

@jreback jreback added the IO XML read_xml, to_xml label Dec 31, 2021
Copy link
Contributor

@ParfaitG ParfaitG left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @phofl, for keeping IO XML to current typing standards! I have a few comments in my walk-through.

@@ -246,6 +246,7 @@ def closed(self) -> bool:
CompressionOptions = Optional[
Union[Literal["infer", "gzip", "bz2", "zip", "xz", "zstd"], CompressionDict]
]
XMLParsers = Literal["lxml", "etree"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice addition for this type. Maybe inspired by new IO XML module.

pandas/io/xml.py Outdated
stylesheet,
path_or_buffer: FilePath | ReadBuffer[bytes] | ReadBuffer[str],
xpath: str,
namespaces: dict | list[dict] | None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I cannot see how list of dicts is used for this argument. Each XML doc will have a unique set of namespaces (i.e., prefix to URI pairs). Does mypy raise? Consider dict[str, str] though.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could not find them on my newest commit, did you maybe review an older one? I have moved the types from the public api first and checked them for validity afterwards. I realised that list[dict] did not work and removed them again then

pandas/io/xml.py Outdated
@@ -765,7 +764,7 @@ def read_xml(
expressions. For more complex XPath, use ``lxml`` which requires
installation.

namespaces : dict, optional
namespaces : dict, list of dicts, optional
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again namespaces will not be a list of dicts.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep removed, forgot this

@@ -782,6 +782,19 @@ def test_read_xml_passing_as_positional_deprecated(datapath):
)


@td.skip_if_no("lxml")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good test for None encoding. Can this new test be moved few lines earlier under previous ENCODING section (lines ~700-730). I wonder if we should also test for etree with None expecting success with tm.assert_frame_equal. etree tends to be more active in API so may change defaults with future versions.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added and moved

@ParfaitG
Copy link
Contributor

LGTM! Thanks, @phofl!

@phofl
Copy link
Member Author

phofl commented Jan 1, 2022

cc @jreback

@jreback
Copy link
Contributor

jreback commented Jan 1, 2022

can you rebase (as merged other one). or is this orthogonal?

@jreback jreback added this to the 1.4 milestone Jan 1, 2022
@phofl
Copy link
Member Author

phofl commented Jan 1, 2022

This should be orthogonal, but better to rebase

@jreback jreback merged commit a2aa477 into pandas-dev:master Jan 4, 2022
@jreback
Copy link
Contributor

jreback commented Jan 4, 2022

thanks @phofl

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
IO XML read_xml, to_xml Typing type annotations, mypy/pyright type checking
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants