Skip to content

DOC: update the DataFrame.to_hdf() docstirng #20186

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from Mar 22, 2018
Merged
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
95 changes: 67 additions & 28 deletions pandas/core/generic.py
Original file line number Diff line number Diff line change
Expand Up @@ -1786,40 +1786,47 @@ def to_json(self, path_or_buf=None, orient=None, date_format=None,
index=index)

def to_hdf(self, path_or_buf, key, **kwargs):
"""Write the contained data to an HDF5 file using HDFStore.
"""
Write the contained data to an HDF5 file using HDFStore.

Hierarchical Data Format (HDF) is self-describing, allowing an
application to interpret the structure and contents of a file with
no outside information. One HDF file can hold a mix of related objects
which can be accessed as a group or as individual objects.

In order to add another :class:`~pandas.DataFrame` or
:class:`~pandas.Series` to an existing HDF file please use append mode
and different a key.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add here a link to the user guide (because there is a lot more information there). You can use something like For more information see the :ref:`user guide <io.hdf5>`

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the hint! This one is hard for me, because i'm not an expert in rst :( I put the line as you suggested, with the right subsection of that manual. when i generate html with 'make.py html' it doesnt' have a link. Is it ok?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that is fine (when only building the docstring, the full user guide is not built, and therefore the link does not seem to work)

Parameters
----------
path_or_buf : the path (string) or HDFStore object
key : string
identifier for the group in the store
mode : optional, {'a', 'w', 'r+'}, default 'a'

``'w'``
Write; a new file is created (an existing file with the same
name would be deleted).
``'a'``
Append; an existing file is opened for reading and writing,
and if the file does not exist it is created.
``'r+'``
It is similar to ``'a'``, but the file must already exist.
format : 'fixed(f)|table(t)', default is 'fixed'
fixed(f) : Fixed format
Fast writing/reading. Not-appendable, nor searchable
table(t) : Table format
Write as a PyTables Table structure which may perform
worse but allow more flexible operations like searching
/ selecting subsets of the data
path_or_buf : str or pandas.HDFStore
File path or HDFStore object.
key : str
Identifier for the group in the store.
mode : {'a', 'w', 'r+'}, default is 'a'
Mode to open file:
- ``'w'``: write, a new file is created (an existing file with
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's good to make this a list, but for sphinx , no indentation is needed (compared to "Mode .." on the line above), but, it needs a blank line between both lines (rst syntax details ...)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

the same name would be deleted).
- ``'a'``: append, an existing file is opened for reading and
writing, and if the file does not exist it is created.
- `'r+'`: similar to ``'a'``, but the file must already exist.
format : {'fixed', 'table'}, default is 'fixed'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"default is 'fixed' " -> "default 'fixed' "

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Possible values:
- fixed: Fixed format. Fast writing/reading. Not-appendable,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here about indentation / blank line

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, can you add single quotes around fixed (like 'fixed'), to make it clear it is a string

(and same for table below)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

nor searchable.
- table: Table format. Write as a PyTables Table structure
which may perform worse but allow more flexible operations
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

another identation issue. Here, the "which ..." needs to align with "table: .." on the line above

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

like searching / selecting subsets of the data.
append : boolean, default False
For Table formats, append the input data to the existing
data_columns : list of columns, or True, default None
For Table formats, append the input data to the existing.
data_columns : list of columns or True, optional
List of columns to create as indexed data columns for on-disk
queries, or True to use all columns. By default only the axes
of the object are indexed. See `here
<http://pandas.pydata.org/pandas-docs/stable/io.html#query-via-data-columns>`__.

Applicable only to format='table'.
complevel : int, 0-9, default None
complevel : {0-9}, optional
Specifies a compression level for data.
A value of 0 disables compression.
complib : {'zlib', 'lzo', 'bzip2', 'blosc'}, default 'zlib'
Expand All @@ -1831,11 +1838,43 @@ def to_hdf(self, path_or_buf, key, **kwargs):
Specifying a compression library which is not available issues
a ValueError.
fletcher32 : bool, default False
If applying compression use the fletcher32 checksum
dropna : boolean, default False.
If applying compression use the fletcher32 checksum.
dropna : bool, default False
If true, ALL nan rows will not be written to store.
"""

See Also
--------
DataFrame.read_hdf : read from HDF file.
DataFrame.to_parquet : write a DataFrame to the binary parquet format.
DataFrame.to_sql : write to a sql table.
DataFrame.to_feather : write out feather-format for DataFrames.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add DataFrame.to_parquet, read_hdf

DataFrame.to_csv : write out to a csv file.

Examples
--------
>>> df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]},
... index=['a', 'b', 'c'])
>>> df.to_hdf('data.h5', key='df', mode='w')

We can append another object to the same file:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would use "add" here instead of "append", because "append" is also a keyword with a different behaviour (appending rows to the same table, not the same file)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


>>> s = pd.Series([1, 2, 3, 4])
>>> s.to_hdf('data.h5', key='s')

Reading from HDF file:

>>> pd.read_hdf('data.h5', 'df')
A B
a 1 4
b 2 5
c 3 6
>>> pd.read_hdf('data.h5', 's')
0 1
1 2
2 3
3 4
dtype: int64
"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add in the end here a code block with

>>> import os
>>> os.remove('data.h5')

(so running the doctests does not leave behind files)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Many thanks for you comments, they are really useful.

from pandas.io import pytables
return pytables.to_hdf(path_or_buf, key, self, **kwargs)

Expand Down