Skip to content

Commit 965c096

Browse files
ivirshupnoatamir
authored andcommitted
Remove out-of-date external compatibility section from hdf_store docs (pandas-dev#49088)
* Remove out-of-date external compatibility section from hdf_store docs * Remove references
1 parent ede9ba3 commit 965c096

File tree

3 files changed

+1
-98
lines changed

3 files changed

+1
-98
lines changed

doc/source/getting_started/comparison/comparison_with_r.rst

-4
Original file line numberDiff line numberDiff line change
@@ -21,10 +21,6 @@ libraries, we care about the following things:
2121
This page is also here to offer a bit of a translation guide for users of these
2222
R packages.
2323

24-
For transfer of ``DataFrame`` objects from pandas to R, one option is to
25-
use HDF5 files, see :ref:`io.external_compatibility` for an
26-
example.
27-
2824

2925
Quick reference
3026
---------------

doc/source/user_guide/io.rst

-93
Original file line numberDiff line numberDiff line change
@@ -5264,99 +5264,6 @@ You could inadvertently turn an actual ``nan`` value into a missing value.
52645264
store.append("dfss2", dfss, nan_rep="_nan_")
52655265
store.select("dfss2")
52665266
5267-
.. _io.external_compatibility:
5268-
5269-
External compatibility
5270-
''''''''''''''''''''''
5271-
5272-
``HDFStore`` writes ``table`` format objects in specific formats suitable for
5273-
producing loss-less round trips to pandas objects. For external
5274-
compatibility, ``HDFStore`` can read native ``PyTables`` format
5275-
tables.
5276-
5277-
It is possible to write an ``HDFStore`` object that can easily be imported into ``R`` using the
5278-
``rhdf5`` library (`Package website`_). Create a table format store like this:
5279-
5280-
.. _package website: https://www.bioconductor.org/packages/release/bioc/html/rhdf5.html
5281-
5282-
.. ipython:: python
5283-
5284-
df_for_r = pd.DataFrame(
5285-
{
5286-
"first": np.random.rand(100),
5287-
"second": np.random.rand(100),
5288-
"class": np.random.randint(0, 2, (100,)),
5289-
},
5290-
index=range(100),
5291-
)
5292-
df_for_r.head()
5293-
5294-
store_export = pd.HDFStore("export.h5")
5295-
store_export.append("df_for_r", df_for_r, data_columns=df_dc.columns)
5296-
store_export
5297-
5298-
.. ipython:: python
5299-
:suppress:
5300-
5301-
store_export.close()
5302-
os.remove("export.h5")
5303-
5304-
In R this file can be read into a ``data.frame`` object using the ``rhdf5``
5305-
library. The following example function reads the corresponding column names
5306-
and data values from the values and assembles them into a ``data.frame``:
5307-
5308-
.. code-block:: R
5309-
5310-
# Load values and column names for all datasets from corresponding nodes and
5311-
# insert them into one data.frame object.
5312-
5313-
library(rhdf5)
5314-
5315-
loadhdf5data <- function(h5File) {
5316-
5317-
listing <- h5ls(h5File)
5318-
# Find all data nodes, values are stored in *_values and corresponding column
5319-
# titles in *_items
5320-
data_nodes <- grep("_values", listing$name)
5321-
name_nodes <- grep("_items", listing$name)
5322-
data_paths = paste(listing$group[data_nodes], listing$name[data_nodes], sep = "/")
5323-
name_paths = paste(listing$group[name_nodes], listing$name[name_nodes], sep = "/")
5324-
columns = list()
5325-
for (idx in seq(data_paths)) {
5326-
# NOTE: matrices returned by h5read have to be transposed to obtain
5327-
# required Fortran order!
5328-
data <- data.frame(t(h5read(h5File, data_paths[idx])))
5329-
names <- t(h5read(h5File, name_paths[idx]))
5330-
entry <- data.frame(data)
5331-
colnames(entry) <- names
5332-
columns <- append(columns, entry)
5333-
}
5334-
5335-
data <- data.frame(columns)
5336-
5337-
return(data)
5338-
}
5339-
5340-
Now you can import the ``DataFrame`` into R:
5341-
5342-
.. code-block:: R
5343-
5344-
> data = loadhdf5data("transfer.hdf5")
5345-
> head(data)
5346-
first second class
5347-
1 0.4170220047 0.3266449 0
5348-
2 0.7203244934 0.5270581 0
5349-
3 0.0001143748 0.8859421 1
5350-
4 0.3023325726 0.3572698 1
5351-
5 0.1467558908 0.9085352 1
5352-
6 0.0923385948 0.6233601 1
5353-
5354-
.. note::
5355-
The R function lists the entire HDF5 file's contents and assembles the
5356-
``data.frame`` object from all matching nodes, so use this only as a
5357-
starting point if you have stored multiple ``DataFrame`` objects to a
5358-
single HDF5 file.
5359-
53605267
53615268
Performance
53625269
'''''''''''

doc/source/whatsnew/v0.16.0.rst

+1-1
Original file line numberDiff line numberDiff line change
@@ -206,7 +206,7 @@ Other enhancements
206206
- Added ``decimal`` option in ``to_csv`` to provide formatting for non-'.' decimal separators (:issue:`781`)
207207
- Added ``normalize`` option for ``Timestamp`` to normalized to midnight (:issue:`8794`)
208208
- Added example for ``DataFrame`` import to R using HDF5 file and ``rhdf5``
209-
library. See the :ref:`documentation <io.external_compatibility>` for more
209+
library. See the documentation for more
210210
(:issue:`9636`).
211211

212212
.. _whatsnew_0160.api:

0 commit comments

Comments
 (0)