Skip to content

Remove out-of-date external compatibility section from hdf_store docs #49088

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Oct 14, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 0 additions & 4 deletions doc/source/getting_started/comparison/comparison_with_r.rst
Original file line number Diff line number Diff line change
Expand Up @@ -21,10 +21,6 @@ libraries, we care about the following things:
This page is also here to offer a bit of a translation guide for users of these
R packages.

For transfer of ``DataFrame`` objects from pandas to R, one option is to
use HDF5 files, see :ref:`io.external_compatibility` for an
example.


Quick reference
---------------
Expand Down
93 changes: 0 additions & 93 deletions doc/source/user_guide/io.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5264,99 +5264,6 @@ You could inadvertently turn an actual ``nan`` value into a missing value.
store.append("dfss2", dfss, nan_rep="_nan_")
store.select("dfss2")

.. _io.external_compatibility:

External compatibility
''''''''''''''''''''''

``HDFStore`` writes ``table`` format objects in specific formats suitable for
producing loss-less round trips to pandas objects. For external
compatibility, ``HDFStore`` can read native ``PyTables`` format
tables.

It is possible to write an ``HDFStore`` object that can easily be imported into ``R`` using the
``rhdf5`` library (`Package website`_). Create a table format store like this:

.. _package website: https://www.bioconductor.org/packages/release/bioc/html/rhdf5.html

.. ipython:: python

df_for_r = pd.DataFrame(
{
"first": np.random.rand(100),
"second": np.random.rand(100),
"class": np.random.randint(0, 2, (100,)),
},
index=range(100),
)
df_for_r.head()

store_export = pd.HDFStore("export.h5")
store_export.append("df_for_r", df_for_r, data_columns=df_dc.columns)
store_export

.. ipython:: python
:suppress:

store_export.close()
os.remove("export.h5")

In R this file can be read into a ``data.frame`` object using the ``rhdf5``
library. The following example function reads the corresponding column names
and data values from the values and assembles them into a ``data.frame``:

.. code-block:: R

# Load values and column names for all datasets from corresponding nodes and
# insert them into one data.frame object.

library(rhdf5)

loadhdf5data <- function(h5File) {

listing <- h5ls(h5File)
# Find all data nodes, values are stored in *_values and corresponding column
# titles in *_items
data_nodes <- grep("_values", listing$name)
name_nodes <- grep("_items", listing$name)
data_paths = paste(listing$group[data_nodes], listing$name[data_nodes], sep = "/")
name_paths = paste(listing$group[name_nodes], listing$name[name_nodes], sep = "/")
columns = list()
for (idx in seq(data_paths)) {
# NOTE: matrices returned by h5read have to be transposed to obtain
# required Fortran order!
data <- data.frame(t(h5read(h5File, data_paths[idx])))
names <- t(h5read(h5File, name_paths[idx]))
entry <- data.frame(data)
colnames(entry) <- names
columns <- append(columns, entry)
}

data <- data.frame(columns)

return(data)
}

Now you can import the ``DataFrame`` into R:

.. code-block:: R

> data = loadhdf5data("transfer.hdf5")
> head(data)
first second class
1 0.4170220047 0.3266449 0
2 0.7203244934 0.5270581 0
3 0.0001143748 0.8859421 1
4 0.3023325726 0.3572698 1
5 0.1467558908 0.9085352 1
6 0.0923385948 0.6233601 1

.. note::
The R function lists the entire HDF5 file's contents and assembles the
``data.frame`` object from all matching nodes, so use this only as a
starting point if you have stored multiple ``DataFrame`` objects to a
single HDF5 file.


Performance
'''''''''''
Expand Down
2 changes: 1 addition & 1 deletion doc/source/whatsnew/v0.16.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -206,7 +206,7 @@ Other enhancements
- Added ``decimal`` option in ``to_csv`` to provide formatting for non-'.' decimal separators (:issue:`781`)
- Added ``normalize`` option for ``Timestamp`` to normalized to midnight (:issue:`8794`)
- Added example for ``DataFrame`` import to R using HDF5 file and ``rhdf5``
library. See the :ref:`documentation <io.external_compatibility>` for more
library. See the documentation for more
(:issue:`9636`).

.. _whatsnew_0160.api:
Expand Down