Skip to content

Pandas IO XML Issue Tracker #40131

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
6 of 14 tasks
ParfaitG opened this issue Mar 1, 2021 · 1 comment
Closed
6 of 14 tasks

Pandas IO XML Issue Tracker #40131

ParfaitG opened this issue Mar 1, 2021 · 1 comment
Labels
Enhancement IO XML read_xml, to_xml Master Tracker High level tracker for similar issues

Comments

@ParfaitG
Copy link
Contributor

ParfaitG commented Mar 1, 2021

Issue tracking for new pandas.io.xml module (after merge: #39516):

To-Do

  • BUG: Fix declaration not showing for xml_declaration=True when pretty_print=False for etree parser.
  • CLN: Centralize docstrings to avoid repetition in formats.py and frame.py.
  • CLN: When etree supports pretty_print, remove xml.dom.minidom reliance.
  • TYP: Refactor code for type hints on parse_doc methods for optional dependency, lxml.
  • TST: Add tests for edge cases (ParserError, OSError, URLError, etc. ). See checklist in tests code.
  • TST: Add more tests for storage_options (i.e., read/write to pandas-test S3 bucket).

Enhancements

  • ENH: Add parse_dates and dtype converters similar to other IO methods.
  • ENH: Add support for nullable dtyes in reading and exporting XML.
  • ENH: Add iterparse for memory efficient parsing of large XML. See etree iterparse and lxml iterparse.
  • ENH: Add xpath_vars to pass $ variables in xpath expression. See lxml xpath() method.
  • ENH: Add xsl_params to pass values into XSLT script. See lxml stylesheet parameters.
  • ENH: Add prefix_cols to specify which columns should have namespace prefixes.
  • ENH: Add nested (bool) to write out nested node-sets for data frames with hierarchical columns or multindex.
  • ENH: Add engine for external processors for XPath and XSLT 2.0 and 3.0, XQuery, streaming, and others.
@ParfaitG ParfaitG added Enhancement Needs Triage Issue that has not been reviewed by a pandas team member labels Mar 1, 2021
@ParfaitG ParfaitG changed the title Pandas XML IO Issue Tracker Pandas IO XML Issue Tracker Mar 1, 2021
@lithomas1 lithomas1 added IO XML read_xml, to_xml Master Tracker High level tracker for similar issues and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Mar 1, 2021
@mroeschke
Copy link
Member

It appears a lot of the issues have been address. Additionally IMO it's easier to use individual issues + labels to identify what needs working on so closing for now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement IO XML read_xml, to_xml Master Tracker High level tracker for similar issues
Projects
None yet
Development

No branches or pull requests

3 participants