From 69fb992754606108b45de360936eee5cd91db890 Mon Sep 17 00:00:00 2001 From: Isaac Virshup Date: Fri, 14 Oct 2022 13:31:43 +0200 Subject: [PATCH 1/2] Remove out-of-date external compatibility section from hdf_store docs --- doc/source/user_guide/io.rst | 93 ------------------------------------ 1 file changed, 93 deletions(-) diff --git a/doc/source/user_guide/io.rst b/doc/source/user_guide/io.rst index 1552f2a8d257b..63e6b007f77a8 100644 --- a/doc/source/user_guide/io.rst +++ b/doc/source/user_guide/io.rst @@ -5264,99 +5264,6 @@ You could inadvertently turn an actual ``nan`` value into a missing value. store.append("dfss2", dfss, nan_rep="_nan_") store.select("dfss2") -.. _io.external_compatibility: - -External compatibility -'''''''''''''''''''''' - -``HDFStore`` writes ``table`` format objects in specific formats suitable for -producing loss-less round trips to pandas objects. For external -compatibility, ``HDFStore`` can read native ``PyTables`` format -tables. - -It is possible to write an ``HDFStore`` object that can easily be imported into ``R`` using the -``rhdf5`` library (`Package website`_). Create a table format store like this: - -.. _package website: https://www.bioconductor.org/packages/release/bioc/html/rhdf5.html - -.. ipython:: python - - df_for_r = pd.DataFrame( - { - "first": np.random.rand(100), - "second": np.random.rand(100), - "class": np.random.randint(0, 2, (100,)), - }, - index=range(100), - ) - df_for_r.head() - - store_export = pd.HDFStore("export.h5") - store_export.append("df_for_r", df_for_r, data_columns=df_dc.columns) - store_export - -.. ipython:: python - :suppress: - - store_export.close() - os.remove("export.h5") - -In R this file can be read into a ``data.frame`` object using the ``rhdf5`` -library. The following example function reads the corresponding column names -and data values from the values and assembles them into a ``data.frame``: - -.. code-block:: R - - # Load values and column names for all datasets from corresponding nodes and - # insert them into one data.frame object. - - library(rhdf5) - - loadhdf5data <- function(h5File) { - - listing <- h5ls(h5File) - # Find all data nodes, values are stored in *_values and corresponding column - # titles in *_items - data_nodes <- grep("_values", listing$name) - name_nodes <- grep("_items", listing$name) - data_paths = paste(listing$group[data_nodes], listing$name[data_nodes], sep = "/") - name_paths = paste(listing$group[name_nodes], listing$name[name_nodes], sep = "/") - columns = list() - for (idx in seq(data_paths)) { - # NOTE: matrices returned by h5read have to be transposed to obtain - # required Fortran order! - data <- data.frame(t(h5read(h5File, data_paths[idx]))) - names <- t(h5read(h5File, name_paths[idx])) - entry <- data.frame(data) - colnames(entry) <- names - columns <- append(columns, entry) - } - - data <- data.frame(columns) - - return(data) - } - -Now you can import the ``DataFrame`` into R: - -.. code-block:: R - - > data = loadhdf5data("transfer.hdf5") - > head(data) - first second class - 1 0.4170220047 0.3266449 0 - 2 0.7203244934 0.5270581 0 - 3 0.0001143748 0.8859421 1 - 4 0.3023325726 0.3572698 1 - 5 0.1467558908 0.9085352 1 - 6 0.0923385948 0.6233601 1 - -.. note:: - The R function lists the entire HDF5 file's contents and assembles the - ``data.frame`` object from all matching nodes, so use this only as a - starting point if you have stored multiple ``DataFrame`` objects to a - single HDF5 file. - Performance ''''''''''' From 2f166dd067bf0635e74249a6774521473de48612 Mon Sep 17 00:00:00 2001 From: Isaac Virshup Date: Fri, 14 Oct 2022 15:27:25 +0200 Subject: [PATCH 2/2] Remove references --- doc/source/getting_started/comparison/comparison_with_r.rst | 4 ---- doc/source/whatsnew/v0.16.0.rst | 2 +- 2 files changed, 1 insertion(+), 5 deletions(-) diff --git a/doc/source/getting_started/comparison/comparison_with_r.rst b/doc/source/getting_started/comparison/comparison_with_r.rst index f91f4218c3429..767779b0f58a8 100644 --- a/doc/source/getting_started/comparison/comparison_with_r.rst +++ b/doc/source/getting_started/comparison/comparison_with_r.rst @@ -21,10 +21,6 @@ libraries, we care about the following things: This page is also here to offer a bit of a translation guide for users of these R packages. -For transfer of ``DataFrame`` objects from pandas to R, one option is to -use HDF5 files, see :ref:`io.external_compatibility` for an -example. - Quick reference --------------- diff --git a/doc/source/whatsnew/v0.16.0.rst b/doc/source/whatsnew/v0.16.0.rst index 8d0d6854cbf85..d53ea095bb96c 100644 --- a/doc/source/whatsnew/v0.16.0.rst +++ b/doc/source/whatsnew/v0.16.0.rst @@ -206,7 +206,7 @@ Other enhancements - Added ``decimal`` option in ``to_csv`` to provide formatting for non-'.' decimal separators (:issue:`781`) - Added ``normalize`` option for ``Timestamp`` to normalized to midnight (:issue:`8794`) - Added example for ``DataFrame`` import to R using HDF5 file and ``rhdf5`` - library. See the :ref:`documentation ` for more + library. See the documentation for more (:issue:`9636`). .. _whatsnew_0160.api: