-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
ENH: Add I/O support of XML with pandas.read_xml and DataFrame.to_xml… #39516
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 2 commits
b67d876
98e3bcd
cd79a06
6c06dc2
fadcb67
ac5fd3a
25ba341
143402a
938b0a0
a92c21e
51f10f2
3520d58
4832562
2914c32
72d0e93
6453f6e
8af695e
b80b8ce
a6cfc90
6c4e0b4
a57fd35
16cbcd3
23439b4
2effae0
878eebe
35fa6a6
80d44f9
f861d53
947840a
f8dc56c
cb34dde
3133486
a7716b8
701d225
5b93c16
9a0dfb4
9556035
82ac370
c478cb0
e23200d
b0b3759
b48e257
453ac40
9b21636
bea318c
49343b1
ce986bc
347d58b
e2f80db
c7e1e11
9790e7c
df9ecf4
46719b7
5d75d51
66c01d2
5c0af6e
2eae8ad
603644e
3ec7297
6194f83
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -33,6 +33,43 @@ For example: | |
storage_options=headers | ||
) | ||
|
||
.. _whatsnew_130.window_method_table: | ||
|
||
:class:`Rolling` and :class:`Expanding` now support a ``method`` argument with a | ||
``'table'`` option that performs the windowing operation over an entire :class:`DataFrame`. | ||
See ref:`window.overview` for performance and functional benefits. (:issue:`15095`) | ||
|
||
jreback marked this conversation as resolved.
Show resolved
Hide resolved
|
||
.. _whatsnew_130.read_to_xml: | ||
|
||
We added I/O support to read and render shallow versions of XML documents with | ||
:func:`pandas.read_xml` and :meth:`DataFrame.to_xml`. Using lxml as parser, | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. can you add a reference to lxml (same one as we have in install.rst) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Will do. |
||
full XPath 1.0 and XSLT 1.0 is available. (:issue:`27554`) | ||
|
||
.. ipython:: python | ||
|
||
xml = """<?xml version='1.0' encoding='utf-8'?> | ||
<data> | ||
<row> | ||
<shape>square</shape> | ||
<degrees>360</degrees> | ||
<sides>4.0</sides> | ||
</row> | ||
<row> | ||
<shape>circle</shape> | ||
<degrees>360</degrees> | ||
<sides/> | ||
</row> | ||
<row> | ||
<shape>triangle</shape> | ||
<degrees>180</degrees> | ||
<sides>3.0</sides> | ||
</row> | ||
</data>""" | ||
|
||
df = pd.read_xml(xml) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. you need to show the rendered df, so end the ipython block here, and then add another one for the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Will do. |
||
|
||
df.to_xml() | ||
|
||
.. _whatsnew_130.enhancements.other: | ||
|
||
Other enhancements | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -167,6 +167,7 @@ | |
read_feather, | ||
read_gbq, | ||
read_html, | ||
read_xml, | ||
read_json, | ||
read_stata, | ||
read_sas, | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -2604,6 +2604,178 @@ def to_html( | |
render_links=render_links, | ||
) | ||
|
||
def to_xml( | ||
self, | ||
io: Optional[FilePathOrBuffer[str]] = None, | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. name this path_or_buffer There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done. |
||
index: Optional[bool] = True, | ||
root_name: Optional[str] = "data", | ||
row_name: Optional[str] = "row", | ||
na_rep: Optional[str] = None, | ||
attr_cols: Optional[Union[str, List[str]]] = None, | ||
elem_cols: Optional[Union[str, List[str]]] = None, | ||
namespaces: Optional[Union[dict, List[dict]]] = None, | ||
prefix: Optional[str] = None, | ||
encoding: Optional[str] = "utf-8", | ||
xml_declaration: Optional[bool] = True, | ||
pretty_print: Optional[bool] = True, | ||
parser: Optional[str] = "lxml", | ||
stylesheet: Optional[FilePathOrBuffer[str]] = None, | ||
) -> Optional[str]: | ||
""" | ||
Render a DataFrame to an XML document. | ||
|
||
.. versionadded:: 1.3.0 | ||
|
||
Parameters | ||
---------- | ||
io : str, path object or file-like object, optional | ||
File to write output to. If None, the output is returned as a | ||
string. | ||
index : bool, optional | ||
Whether to include index in XML document. | ||
root_name : str, default 'data' | ||
The name of root element in XML document. | ||
root_name : str, default 'row' | ||
The name of row element in XML document. | ||
na_rep : str, optional | ||
Missing data representation. | ||
attr_cols : list-like, optional | ||
List of columns to write as attributes in row element. | ||
Hierarchical columns will be flattened with underscore | ||
delimiting the different levels. | ||
elem_cols : list-like, optional | ||
jreback marked this conversation as resolved.
Show resolved
Hide resolved
|
||
List of columns to write as children in row element. By default, | ||
all columns output as children of row element. Hierarchical | ||
columns will be flattened with underscore delimiting the | ||
different levels. | ||
namespaces : dict, optional | ||
All namespaces to be defined in root element. Keys of dict | ||
should be prefix names and values of dict corresponding URIs. | ||
Default namespaces should be given empty string key. For | ||
example, :: | ||
|
||
namespaces = {'': 'https://example.com'} | ||
|
||
prefix : str, optional | ||
Namespace prefix to be used for every element and/or attribute | ||
in document. This should be one of the keys in ``namespaces`` | ||
dict. | ||
encoding : str, optional, default 'utf-8' | ||
Encoding of the resulting document. | ||
xml_declaration : str, optional | ||
Whether to include the XML declaration at start of document. | ||
pretty_print : bool, optional | ||
Whether output should be pretty printed with indentation and | ||
line breaks. | ||
parser : {'lxml','etree'}, default "lxml" | ||
Parser module to use for building of tree. Only 'lxml' and | ||
'etree' are supported. With 'lxml', the ability to use XSLT | ||
stylesheet is supported. Default parser uses 'lxml'. If | ||
module is not installed a warning will raise and process | ||
will continue with 'etree'. | ||
stylesheet : str, path object or file-like object, optional | ||
A URL, file-like object, or a raw string containing an XSLT | ||
script used to transform the raw XML output. Script should use | ||
layout of elements and attributes from original output. This | ||
argument requires ``lxml`` to be installed. Only XSLT 1.0 | ||
scripts and not later versions is currently supported. | ||
|
||
Returns | ||
------- | ||
None or str | ||
If ``io`` is None, returns the resulting XML format as a | ||
string. Otherwise returns None. | ||
|
||
See Also | ||
-------- | ||
to_json : Convert the pandas object to a JSON string. | ||
to_html : Convert DataFrame to a html. | ||
|
||
Examples | ||
-------- | ||
>>> df = pd.DataFrame({'shape': ['square', 'circle', 'triangle'], | ||
... 'degrees': [360, 360, 180], | ||
... 'sides': [4, np.nan, 3]}) | ||
|
||
>>> df.to_xml() | ||
<?xml version='1.0' encoding='utf-8'?> | ||
<data> | ||
<row> | ||
<index>0</index> | ||
<shape>square</shape> | ||
<degrees>360</degrees> | ||
<sides>4.0</sides> | ||
</row> | ||
<row> | ||
<index>1</index> | ||
<shape>circle</shape> | ||
<degrees>360</degrees> | ||
<sides/> | ||
</row> | ||
<row> | ||
<index>2</index> | ||
<shape>triangle</shape> | ||
<degrees>180</degrees> | ||
<sides>3.0</sides> | ||
</row> | ||
</data> | ||
|
||
>>> df.to_xml(attr_cols=['index', 'shape', 'degrees', 'sides']) | ||
<?xml version='1.0' encoding='utf-8'?> | ||
<data> | ||
<row index="0" shape="square" degrees="360" sides="4.0"/> | ||
<row index="1" shape="circle" degrees="360"/> | ||
<row index="2" shape="triangle" degrees="180" sides="3.0"/> | ||
</data> | ||
|
||
>>> df.to_xml(namespaces = {"doc": "https://example.com"}, | ||
... prefix = "doc") | ||
<?xml version='1.0' encoding='utf-8'?> | ||
<doc:data xmlns:doc="https://example.com"> | ||
<doc:row> | ||
<doc:index>0</doc:index> | ||
<doc:shape>square</doc:shape> | ||
<doc:degrees>360</doc:degrees> | ||
<doc:sides>4.0</doc:sides> | ||
</doc:row> | ||
<doc:row> | ||
<doc:index>1</doc:index> | ||
<doc:shape>circle</doc:shape> | ||
<doc:degrees>360</doc:degrees> | ||
<doc:sides/> | ||
</doc:row> | ||
<doc:row> | ||
<doc:index>2</doc:index> | ||
<doc:shape>triangle</doc:shape> | ||
<doc:degrees>180</doc:degrees> | ||
<doc:sides>3.0</doc:sides> | ||
</doc:row> | ||
</doc:data> | ||
""" | ||
|
||
formatter = fmt.DataFrameFormatter( | ||
self, | ||
index=index, | ||
na_rep=na_rep, | ||
) | ||
|
||
return fmt.DataFrameRenderer(formatter).to_xml( | ||
io=io, | ||
index=index, | ||
root_name=root_name, | ||
row_name=row_name, | ||
na_rep=na_rep, | ||
attr_cols=attr_cols, | ||
elem_cols=elem_cols, | ||
namespaces=namespaces, | ||
prefix=prefix, | ||
encoding=encoding, | ||
xml_declaration=xml_declaration, | ||
pretty_print=pretty_print, | ||
parser=parser, | ||
stylesheet=stylesheet, | ||
) | ||
|
||
# ---------------------------------------------------------------------- | ||
@Substitution( | ||
klass="DataFrame", | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -30,6 +30,7 @@ | |
cast, | ||
) | ||
from unicodedata import east_asian_width | ||
from warnings import warn | ||
|
||
import numpy as np | ||
|
||
|
@@ -914,6 +915,7 @@ class DataFrameRenderer: | |
|
||
Called in pandas.core.frame.DataFrame: | ||
- to_html | ||
- to_xml | ||
- to_string | ||
|
||
Parameters | ||
|
@@ -1003,6 +1005,121 @@ def to_html( | |
string = html_formatter.to_string() | ||
return save_to_buffer(string, buf=buf, encoding=encoding) | ||
|
||
def to_xml( | ||
self, | ||
io: Optional[FilePathOrBuffer[str]] = None, | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. same |
||
index: Optional[bool] = True, | ||
root_name: Optional[str] = "data", | ||
row_name: Optional[str] = "row", | ||
na_rep: Optional[str] = None, | ||
attr_cols: Optional[Union[str, List[str]]] = None, | ||
elem_cols: Optional[Union[str, List[str]]] = None, | ||
namespaces: Optional[Union[dict, List[dict]]] = None, | ||
prefix: Optional[str] = None, | ||
encoding: Optional[str] = "utf-8", | ||
xml_declaration: Optional[bool] = True, | ||
pretty_print: Optional[bool] = True, | ||
parser: Optional[str] = "lxml", | ||
stylesheet: Optional[FilePathOrBuffer[str]] = None, | ||
) -> Optional[str]: | ||
""" | ||
Render a DataFrame to an XML document. | ||
|
||
.. versionadded:: 1.3.0 | ||
|
||
Parameters | ||
---------- | ||
io : str, path object or file-like object, optional | ||
File to write output to. If None, the output is returned as a | ||
string. | ||
index : bool, optional | ||
Whether to include index in XML document. | ||
root_name : str, default 'data' | ||
The name of root element in XML document. | ||
root_name : str, default 'row' | ||
The name of row element in XML document. | ||
na_rep : str, optional | ||
Missing data representation. | ||
attr_cols : list-like, optional | ||
List of columns to write as attributes in row element. | ||
Hierarchical columns will be flattened with underscore | ||
delimiting the different levels. | ||
elem_cols : list-like, optional | ||
List of columns to write as children in row element. By default, | ||
all columns output as children of row element. Hierarchical | ||
columns will be flattened with underscore delimiting the | ||
different levels. | ||
namespaces : dict, optional | ||
All namespaces to be defined in root element. Keys of dict | ||
should be prefix names and values of dict corresponding URIs. | ||
Default namespaces should be given empty string key. For | ||
example, :: | ||
|
||
namespaces = {'': 'https://example.com'} | ||
|
||
prefix : str, optional | ||
Namespace prefix to be used for every element and/or attribute | ||
in document. This should be one of the keys in ``namespaces`` | ||
dict. | ||
encoding : str, optional, default 'utf-8' | ||
Encoding of the resulting document. | ||
xml_declaration : str, optional | ||
Whether to include the XML declaration at start of document. | ||
pretty_print : bool, optional | ||
Whether output should be pretty printed with indentation and | ||
line breaks. | ||
parser : {'lxml','etree'}, default "lxml" | ||
Parser module to use for building of tree. Only 'lxml' and | ||
'etree' are supported. With 'lxml', the ability to use XSLT | ||
stylesheet is supported. Default parser uses 'lxml'. If | ||
module is not installed a warning will raise and process | ||
will continue with 'etree'. | ||
stylesheet : str, path object or file-like object, optional | ||
A URL, file-like object, or a raw string containing an XSLT | ||
script used to transform the raw XML output. Script should use | ||
layout of elements and attributes from original output. This | ||
argument requires ``lxml`` to be installed. Only XSLT 1.0 | ||
scripts and not later versions is currently supported. | ||
""" | ||
|
||
from pandas.io.formats.xml import EtreeXMLFormatter, LxmlXMLFormatter | ||
|
||
if parser == "lxml": | ||
try: | ||
TreeBuilder = LxmlXMLFormatter | ||
jreback marked this conversation as resolved.
Show resolved
Hide resolved
|
||
except ImportError: | ||
warn( | ||
"You do not have lxml installed (default parser). " | ||
"Instead, etree will be used.", | ||
ImportWarning, | ||
) | ||
TreeBuilder = EtreeXMLFormatter | ||
|
||
elif parser == "etree": | ||
TreeBuilder = EtreeXMLFormatter | ||
|
||
else: | ||
raise ValueError("Values for parser can only be lxml or etree.") | ||
|
||
xml_formatter = TreeBuilder( | ||
self.fmt, | ||
io=io, | ||
index=index, | ||
root_name=root_name, | ||
row_name=row_name, | ||
na_rep=na_rep, | ||
attr_cols=attr_cols, | ||
elem_cols=elem_cols, | ||
namespaces=namespaces, | ||
prefix=prefix, | ||
encoding=encoding, | ||
xml_declaration=xml_declaration, | ||
pretty_print=pretty_print, | ||
stylesheet=stylesheet, | ||
) | ||
|
||
return xml_formatter.write_output() | ||
|
||
def to_string( | ||
self, | ||
buf: Optional[FilePathOrBuffer[str]] = None, | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks like you picked up another change here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should I merge latest? And should I add XML section to
io.rst
or handle in different PR?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you should always merge latest every time you are pushing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add docs for io.rst in this PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there is a top-level table in io.rst that needs updating as well (for the I/O read/write methods near the top)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added XML section and updated top-level table.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
great, this still looks like an artfiact from a merge