Skip to content

ENH: Add dtypes/converters arguments for pandas.read_xml #45411

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Jan 23, 2022

Conversation

ParfaitG
Copy link
Contributor

@jreback jreback added IO XML read_xml, to_xml Dtype Conversions Unexpected or buggy dtype conversions labels Jan 17, 2022
pandas/io/xml.py Outdated
@@ -109,6 +130,13 @@ def __init__(
elems_only: bool,
attrs_only: bool,
names: Sequence[str] | None,
dtype: DtypeArg | None,
converters: dict[str, Callable] | None,
parse_dates: bool
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm I think we have a typing alias for this? e.g. is this what we are doing in csv parsers?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did look in pandas._typing. For read_csv, there is no typing for converters and parse_dates:

def read_csv(
    filepath_or_buffer: FilePath | ReadCsvBuffer[bytes] | ReadCsvBuffer[str],
    ...
    # General Parsing Configuration
    dtype: DtypeArg | None = None,
    ...
    converters=None,
    ...
    # Datetime Handling
    parse_dates=None,
    ...
):

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same with read_excel. Let me know how to handle typing for read_xml. Maybe raise a TYP issue for future PR?

def read_excel(
    io,
    ...
    dtype: DtypeArg | None = None,
    ...
    converters=None,
    ...
    parse_dates=False,
    ...
)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add your alias in _typing and use it (can followup later to use it elsewhere)

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

small comments & pls merge master, ping on green-ish

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Similar to other IO methods, :func:`pandas.read_xml` now supports assigning specific dtypes to columns,
apply converter methods, and parse dates.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add the issue reference here (this PR number if no issue)

@@ -85,6 +85,48 @@ Optional libraries below the lowest tested version may still work, but are not c

See :ref:`install.dependencies` and :ref:`install.optional_dependencies` for more.


.. _whatsnew_140.read_xml_dtypes:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change to 150

pandas/io/xml.py Outdated
@@ -109,6 +130,13 @@ def __init__(
elems_only: bool,
attrs_only: bool,
names: Sequence[str] | None,
dtype: DtypeArg | None,
converters: dict[str, Callable] | None,
parse_dates: bool
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add your alias in _typing and use it (can followup later to use it elsewhere)

@jreback jreback added this to the 1.5 milestone Jan 23, 2022
@jreback jreback merged commit d2d7ffb into pandas-dev:main Jan 23, 2022
@jreback
Copy link
Contributor

jreback commented Jan 23, 2022

thanks @ParfaitG failures unrelated

@ParfaitG ParfaitG deleted the xml_dtypes branch January 23, 2022 23:17
yehoshuadimarsky pushed a commit to yehoshuadimarsky/pandas that referenced this pull request Jul 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Dtype Conversions Unexpected or buggy dtype conversions IO XML read_xml, to_xml
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ENH: Possible to add dtype/converters as arguments for pandas.read_xml() ?
2 participants