DOC: update the DataFrame.to_hdf() docstirng #20186

ghost · 2018-03-10T15:53:24Z

Checklist for the pandas documentation sprint (ignore this if you are doing
an unrelated PR):

PR title is "DOC: update the docstring"
The validation script passes: scripts/validate_docstrings.py <your-function-or-method>
The PEP8 style check passes: git diff upstream/master -u -- "*.py" | flake8 --diff
The html version looks good: python doc/make.py --single <your-function-or-method>
It has been proofread on language by another sprint participant

Please include the output of the validation script below between the "```" ticks:

################################################################################
##################### Docstring (pandas.DataFrame.to_hdf)  #####################
################################################################################

Write the contained data to an HDF5 file using HDFStore.

Hierarchical Data Format (HDF) is self-describing, allowing an
application to interpret the structure and contents of a file with
no outside information. One HDF file can hold a mix of related objects
which can be accessed as a group or as individual objects.

In order to add another :class:`~pandas.DataFrame` or
:class:`~pandas.Series` to an existing HDF file please use append mode
and different a key.

Parameters
----------
path_or_buf : str or pandas.HDFStore
    File path or HDFStore object.
key : str
    Identifier for the group in the store.
mode : {'a', 'w', 'r+'}, default is 'a'
    Mode to open file:
        - ``'w'``: write, a new file is created (an existing file with
            the same name would be deleted).
        - ``'a'``: append, an existing file is opened for reading and
            writing, and if the file does not exist it is created.
        - `'r+'`: similar to ``'a'``, but the file must already exist.
format : {'fixed', 'table'}, default is 'fixed'
    Possible values:
        - fixed: Fixed format. Fast writing/reading. Not-appendable,
            nor searchable.
        - table: Table format. Write as a PyTables Table structure
            which may perform worse but allow more flexible operations
            like searching / selecting subsets of the data.
append : boolean, default False
    For Table formats, append the input data to the existing.
data_columns :  list of columns or True, optional
    List of columns to create as indexed data columns for on-disk
    queries, or True to use all columns. By default only the axes
    of the object are indexed. See `here
    <http://pandas.pydata.org/pandas-docs/stable/io.html#query-via-data-columns>`__.
    Applicable only to format='table'.
complevel : {0-9}, optional
    Specifies a compression level for data.
    A value of 0 disables compression.
complib : {'zlib', 'lzo', 'bzip2', 'blosc'}, default 'zlib'
    Specifies the compression library to be used.
    As of v0.20.2 these additional compressors for Blosc are supported
    (default if no compressor specified: 'blosc:blosclz'):
    {'blosc:blosclz', 'blosc:lz4', 'blosc:lz4hc', 'blosc:snappy',
    'blosc:zlib', 'blosc:zstd'}.
    Specifying a compression library which is not available issues
    a ValueError.
fletcher32 : bool, default False
    If applying compression use the fletcher32 checksum.
dropna : bool, default False
    If true, ALL nan rows will not be written to store.

See Also
--------
DataFrame.to_csv : write out to a csv file.
DataFrame.to_sql : write to a sql table.
DataFrame.to_feather : write out feather-format for DataFrames.

Examples
--------
>>> df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]},
...                   index=['a', 'b', 'c'])
>>> df.to_hdf('data.h5', key='df', mode='w')

We can append another object to the same file:

>>> s = pd.Series([1, 2, 3, 4])
>>> s.to_hdf('data.h5', key='s')

Reading from HDF file:

>>> pd.read_hdf('data.h5', 'df')
A  B
a  1  4
b  2  5
c  3  6
>>> pd.read_hdf('data.h5', 's')
0    1
1    2
2    3
3    4
dtype: int64

Notes
-----
Learn more about `Hierarchical Data Format (HDF)
<https://support.hdfgroup.org/HDF5/whatishdf5.html>`__.

################################################################################
################################## Validation ##################################
################################################################################

Errors found:
	Errors in parameters section
		Parameters {'kwargs'} not documented
		Unknown parameters {'format', 'complevel', 'mode', 'append', 'complib', 'dropna', 'data_columns', 'fletcher32'}
	No returns section found

If the validation script still gives errors, but you think there is a good reason
to deviate in this case (and there are certainly such cases), please state this
explicitly.

**kwargs are actually misleading and should be put as args, will do it in separate pull request.
returns None

Checklist for other PRs (remove this part if you are doing a PR for the pandas documentation sprint):

closes #xxxx
tests added / passed
passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry

pep8speaks · 2018-03-10T15:53:28Z

Hello @acidburnburn! Thanks for updating the PR.

In the file pandas/core/generic.py, following are the PEP8 issues :

Line 1810:1: W293 blank line contains whitespace
Line 1818:1: W293 blank line contains whitespace

Comment last updated on March 22, 2018 at 09:13 Hours UTC

jreback · 2018-03-10T22:09:13Z

pandas/core/generic.py

+        --------
+        DataFrame.to_csv : write out to a csv file.
+        DataFrame.to_sql : write to a sql table.
+        DataFrame.to_feather : write out feather-format for DataFrames.


add DataFrame.to_parquet, read_hdf

jreback · 2018-03-10T22:10:33Z

pandas/core/generic.py

+        3    4
+        dtype: int64
+
+        Notes


I think Notes are before examples.

If you want to put a link, link to the section in the io.rst docs. You can add the HDF5 link in there if you want (in an appropriate location)

Actually they already have link to the same manual there, so just removed this section.
Thanks!

Also don't know why, but previously not all changes were pushed here, some remained locally, fixed it. Sorry for that.

jorisvandenbossche

Thanks for the PR!
Added some more comments

jorisvandenbossche · 2018-03-17T11:36:39Z

pandas/core/generic.py

+
+        In order to add another :class:`~pandas.DataFrame` or
+        :class:`~pandas.Series` to an existing HDF file please use append mode
+        and different a key.



Can you add here a link to the user guide (because there is a lot more information there). You can use something like For more information see the :ref:`user guide <io.hdf5>`

Thanks for the hint! This one is hard for me, because i'm not an expert in rst :( I put the line as you suggested, with the right subsection of that manual. when i generate html with 'make.py html' it doesnt' have a link. Is it ok?

Yes, that is fine (when only building the docstring, the full user guide is not built, and therefore the link does not seem to work)

jorisvandenbossche · 2018-03-17T11:37:29Z

pandas/core/generic.py

+            Identifier for the group in the store.
+        mode : {'a', 'w', 'r+'}, default is 'a'
+            Mode to open file:
+                - ``'w'``: write, a new file is created (an existing file with


It's good to make this a list, but for sphinx , no indentation is needed (compared to "Mode .." on the line above), but, it needs a blank line between both lines (rst syntax details ...)

jorisvandenbossche · 2018-03-17T11:37:50Z

pandas/core/generic.py

+                - `'r+'`: similar to ``'a'``, but the file must already exist.
+        format : {'fixed', 'table'}, default is 'fixed'
+            Possible values:
+                - fixed: Fixed format. Fast writing/reading. Not-appendable,


same here about indentation / blank line

also, can you add single quotes around fixed (like 'fixed'), to make it clear it is a string

(and same for table below)

jorisvandenbossche · 2018-03-17T11:38:35Z

pandas/core/generic.py

+                - ``'a'``: append, an existing file is opened for reading and
+                    writing, and if the file does not exist it is created.
+                - `'r+'`: similar to ``'a'``, but the file must already exist.
+        format : {'fixed', 'table'}, default is 'fixed'


"default is 'fixed' " -> "default 'fixed' "

jorisvandenbossche · 2018-03-17T11:39:21Z

pandas/core/generic.py

+                - fixed: Fixed format. Fast writing/reading. Not-appendable,
+                    nor searchable.
+                - table: Table format. Write as a PyTables Table structure
+                    which may perform worse but allow more flexible operations


another identation issue. Here, the "which ..." needs to align with "table: .." on the line above

jorisvandenbossche · 2018-03-17T11:40:38Z

pandas/core/generic.py

+        2    3
+        3    4
+        dtype: int64
+        """


Can you add in the end here a code block with

>>> import os >>> os.remove('data.h5')

(so running the doctests does not leave behind files)

done

Many thanks for you comments, they are really useful.

jorisvandenbossche · 2018-03-17T11:41:21Z

pandas/core/generic.py

+        ...                   index=['a', 'b', 'c'])
+        >>> df.to_hdf('data.h5', key='df', mode='w')
+
+        We can append another object to the same file:


I would use "add" here instead of "append", because "append" is also a keyword with a different behaviour (appending rows to the same table, not the same file)

codecov · 2018-03-21T21:55:39Z

Codecov Report

Merging #20186 into master will decrease coverage by <.01%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master   #20186      +/-   ##
==========================================
- Coverage   91.77%   91.77%   -0.01%     
==========================================
  Files         152      152              
  Lines       49205    49215      +10     
==========================================
+ Hits        45159    45167       +8     
- Misses       4046     4048       +2

Flag	Coverage Δ
#multiple	`90.16% <100%> (-0.01%)`	⬇️
#single	`41.84% <14.28%> (-0.02%)`	⬇️

Impacted Files	Coverage Δ
pandas/core/generic.py	`95.85% <100%> (ø)`	⬆️
pandas/util/testing.py	`83.91% <0%> (-0.05%)`	⬇️
pandas/core/window.py	`96.26% <0%> (-0.01%)`	⬇️
pandas/plotting/_core.py	`82.27% <0%> (ø)`	⬆️
pandas/io/json/normalize.py	`96.93% <0%> (+0.06%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7273ea0...eebfc39. Read the comment docs.

jorisvandenbossche

Thanks for the updates!

jorisvandenbossche · 2018-03-22T09:23:36Z

Thanks @acidburnburn !

improved docstring for DataFrame.to_hdf with description and examples

19bc38a

jreback requested changes Mar 10, 2018

View reviewed changes

jreback added Docs IO HDF5 read_hdf, HDFStore labels Mar 10, 2018

moved Notes and added see also

6d42dd1

jorisvandenbossche reviewed Mar 17, 2018

View reviewed changes

shelvinskyi added 2 commits March 21, 2018 20:24

minor fixes based on PR comments

7781bb5

fixed reference to io section

429afbe

small fixup

eebfc39

jorisvandenbossche approved these changes Mar 22, 2018

View reviewed changes

jorisvandenbossche merged commit 86684ad into pandas-dev:master Mar 22, 2018

jorisvandenbossche added this to the 0.23.0 milestone Mar 22, 2018

javadnoorb pushed a commit to javadnoorb/pandas that referenced this pull request Mar 29, 2018

DOC: update the DataFrame.to_hdf() docstirng (pandas-dev#20186)

cdc0240

dworvos pushed a commit to dworvos/pandas that referenced this pull request Apr 2, 2018

DOC: update the DataFrame.to_hdf() docstirng (pandas-dev#20186)

e9f498e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DOC: update the DataFrame.to_hdf() docstirng #20186

DOC: update the DataFrame.to_hdf() docstirng #20186

ghost commented Mar 10, 2018

pep8speaks commented Mar 10, 2018 •

edited

Loading

jreback Mar 10, 2018

jreback Mar 10, 2018

ghost Mar 11, 2018

ghost Mar 11, 2018

jorisvandenbossche left a comment

jorisvandenbossche Mar 17, 2018

shelvinskyi Mar 21, 2018

jorisvandenbossche Mar 22, 2018

jorisvandenbossche Mar 17, 2018

shelvinskyi Mar 21, 2018

jorisvandenbossche Mar 17, 2018

jorisvandenbossche Mar 17, 2018

shelvinskyi Mar 21, 2018

jorisvandenbossche Mar 17, 2018

shelvinskyi Mar 21, 2018

jorisvandenbossche Mar 17, 2018

shelvinskyi Mar 21, 2018

jorisvandenbossche Mar 17, 2018

shelvinskyi Mar 21, 2018

jorisvandenbossche Mar 17, 2018

shelvinskyi Mar 21, 2018

codecov bot commented Mar 21, 2018 •

edited

Loading

jorisvandenbossche left a comment

jorisvandenbossche commented Mar 22, 2018

DOC: update the DataFrame.to_hdf() docstirng #20186

DOC: update the DataFrame.to_hdf() docstirng #20186

Conversation

ghost commented Mar 10, 2018

pep8speaks commented Mar 10, 2018 • edited Loading

Comment last updated on March 22, 2018 at 09:13 Hours UTC

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jorisvandenbossche left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov bot commented Mar 21, 2018 • edited Loading

Codecov Report

jorisvandenbossche left a comment

Choose a reason for hiding this comment

jorisvandenbossche commented Mar 22, 2018

pep8speaks commented Mar 10, 2018 •

edited

Loading

codecov bot commented Mar 21, 2018 •

edited

Loading