Skip to content

DOC: update the DataFrame.to_hdf() docstirng #20186

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from Mar 22, 2018
Merged

DOC: update the DataFrame.to_hdf() docstirng #20186

merged 5 commits into from Mar 22, 2018

Conversation

ghost
Copy link

@ghost ghost commented Mar 10, 2018

Checklist for the pandas documentation sprint (ignore this if you are doing
an unrelated PR):

  • PR title is "DOC: update the docstring"
  • The validation script passes: scripts/validate_docstrings.py <your-function-or-method>
  • The PEP8 style check passes: git diff upstream/master -u -- "*.py" | flake8 --diff
  • The html version looks good: python doc/make.py --single <your-function-or-method>
  • It has been proofread on language by another sprint participant

Please include the output of the validation script below between the "```" ticks:

################################################################################
##################### Docstring (pandas.DataFrame.to_hdf)  #####################
################################################################################

Write the contained data to an HDF5 file using HDFStore.

Hierarchical Data Format (HDF) is self-describing, allowing an
application to interpret the structure and contents of a file with
no outside information. One HDF file can hold a mix of related objects
which can be accessed as a group or as individual objects.

In order to add another :class:`~pandas.DataFrame` or
:class:`~pandas.Series` to an existing HDF file please use append mode
and different a key.

Parameters
----------
path_or_buf : str or pandas.HDFStore
    File path or HDFStore object.
key : str
    Identifier for the group in the store.
mode : {'a', 'w', 'r+'}, default is 'a'
    Mode to open file:
        - ``'w'``: write, a new file is created (an existing file with
            the same name would be deleted).
        - ``'a'``: append, an existing file is opened for reading and
            writing, and if the file does not exist it is created.
        - `'r+'`: similar to ``'a'``, but the file must already exist.
format : {'fixed', 'table'}, default is 'fixed'
    Possible values:
        - fixed: Fixed format. Fast writing/reading. Not-appendable,
            nor searchable.
        - table: Table format. Write as a PyTables Table structure
            which may perform worse but allow more flexible operations
            like searching / selecting subsets of the data.
append : boolean, default False
    For Table formats, append the input data to the existing.
data_columns :  list of columns or True, optional
    List of columns to create as indexed data columns for on-disk
    queries, or True to use all columns. By default only the axes
    of the object are indexed. See `here
    <http://pandas.pydata.org/pandas-docs/stable/io.html#query-via-data-columns>`__.
    Applicable only to format='table'.
complevel : {0-9}, optional
    Specifies a compression level for data.
    A value of 0 disables compression.
complib : {'zlib', 'lzo', 'bzip2', 'blosc'}, default 'zlib'
    Specifies the compression library to be used.
    As of v0.20.2 these additional compressors for Blosc are supported
    (default if no compressor specified: 'blosc:blosclz'):
    {'blosc:blosclz', 'blosc:lz4', 'blosc:lz4hc', 'blosc:snappy',
    'blosc:zlib', 'blosc:zstd'}.
    Specifying a compression library which is not available issues
    a ValueError.
fletcher32 : bool, default False
    If applying compression use the fletcher32 checksum.
dropna : bool, default False
    If true, ALL nan rows will not be written to store.

See Also
--------
DataFrame.to_csv : write out to a csv file.
DataFrame.to_sql : write to a sql table.
DataFrame.to_feather : write out feather-format for DataFrames.

Examples
--------
>>> df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]},
...                   index=['a', 'b', 'c'])
>>> df.to_hdf('data.h5', key='df', mode='w')

We can append another object to the same file:

>>> s = pd.Series([1, 2, 3, 4])
>>> s.to_hdf('data.h5', key='s')

Reading from HDF file:

>>> pd.read_hdf('data.h5', 'df')
A  B
a  1  4
b  2  5
c  3  6
>>> pd.read_hdf('data.h5', 's')
0    1
1    2
2    3
3    4
dtype: int64

Notes
-----
Learn more about `Hierarchical Data Format (HDF)
<https://support.hdfgroup.org/HDF5/whatishdf5.html>`__.

################################################################################
################################## Validation ##################################
################################################################################

Errors found:
	Errors in parameters section
		Parameters {'kwargs'} not documented
		Unknown parameters {'format', 'complevel', 'mode', 'append', 'complib', 'dropna', 'data_columns', 'fletcher32'}
	No returns section found

If the validation script still gives errors, but you think there is a good reason
to deviate in this case (and there are certainly such cases), please state this
explicitly.

  • **kwargs are actually misleading and should be put as args, will do it in separate pull request.
  • returns None

Checklist for other PRs (remove this part if you are doing a PR for the pandas documentation sprint):

  • closes #xxxx
  • tests added / passed
  • passes git diff upstream/master -u -- "*.py" | flake8 --diff
  • whatsnew entry

@pep8speaks
Copy link

pep8speaks commented Mar 10, 2018

Hello @acidburnburn! Thanks for updating the PR.

Line 1810:1: W293 blank line contains whitespace
Line 1818:1: W293 blank line contains whitespace

Comment last updated on March 22, 2018 at 09:13 Hours UTC

--------
DataFrame.to_csv : write out to a csv file.
DataFrame.to_sql : write to a sql table.
DataFrame.to_feather : write out feather-format for DataFrames.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add DataFrame.to_parquet, read_hdf

3 4
dtype: int64

Notes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think Notes are before examples.

If you want to put a link, link to the section in the io.rst docs. You can add the HDF5 link in there if you want (in an appropriate location)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually they already have link to the same manual there, so just removed this section.
Thanks!

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also don't know why, but previously not all changes were pushed here, some remained locally, fixed it. Sorry for that.

@jreback jreback added Docs IO HDF5 read_hdf, HDFStore labels Mar 10, 2018
Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR!
Added some more comments


In order to add another :class:`~pandas.DataFrame` or
:class:`~pandas.Series` to an existing HDF file please use append mode
and different a key.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add here a link to the user guide (because there is a lot more information there). You can use something like For more information see the :ref:`user guide <io.hdf5>`

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the hint! This one is hard for me, because i'm not an expert in rst :( I put the line as you suggested, with the right subsection of that manual. when i generate html with 'make.py html' it doesnt' have a link. Is it ok?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that is fine (when only building the docstring, the full user guide is not built, and therefore the link does not seem to work)

Identifier for the group in the store.
mode : {'a', 'w', 'r+'}, default is 'a'
Mode to open file:
- ``'w'``: write, a new file is created (an existing file with
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's good to make this a list, but for sphinx , no indentation is needed (compared to "Mode .." on the line above), but, it needs a blank line between both lines (rst syntax details ...)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

- `'r+'`: similar to ``'a'``, but the file must already exist.
format : {'fixed', 'table'}, default is 'fixed'
Possible values:
- fixed: Fixed format. Fast writing/reading. Not-appendable,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here about indentation / blank line

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, can you add single quotes around fixed (like 'fixed'), to make it clear it is a string

(and same for table below)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

- ``'a'``: append, an existing file is opened for reading and
writing, and if the file does not exist it is created.
- `'r+'`: similar to ``'a'``, but the file must already exist.
format : {'fixed', 'table'}, default is 'fixed'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"default is 'fixed' " -> "default 'fixed' "

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

- fixed: Fixed format. Fast writing/reading. Not-appendable,
nor searchable.
- table: Table format. Write as a PyTables Table structure
which may perform worse but allow more flexible operations
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

another identation issue. Here, the "which ..." needs to align with "table: .." on the line above

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

2 3
3 4
dtype: int64
"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add in the end here a code block with

>>> import os
>>> os.remove('data.h5')

(so running the doctests does not leave behind files)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Many thanks for you comments, they are really useful.

... index=['a', 'b', 'c'])
>>> df.to_hdf('data.h5', key='df', mode='w')

We can append another object to the same file:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would use "add" here instead of "append", because "append" is also a keyword with a different behaviour (appending rows to the same table, not the same file)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@codecov
Copy link

codecov bot commented Mar 21, 2018

Codecov Report

Merging #20186 into master will decrease coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #20186      +/-   ##
==========================================
- Coverage   91.77%   91.77%   -0.01%     
==========================================
  Files         152      152              
  Lines       49205    49215      +10     
==========================================
+ Hits        45159    45167       +8     
- Misses       4046     4048       +2
Flag Coverage Δ
#multiple 90.16% <100%> (-0.01%) ⬇️
#single 41.84% <14.28%> (-0.02%) ⬇️
Impacted Files Coverage Δ
pandas/core/generic.py 95.85% <100%> (ø) ⬆️
pandas/util/testing.py 83.91% <0%> (-0.05%) ⬇️
pandas/core/window.py 96.26% <0%> (-0.01%) ⬇️
pandas/plotting/_core.py 82.27% <0%> (ø) ⬆️
pandas/io/json/normalize.py 96.93% <0%> (+0.06%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7273ea0...eebfc39. Read the comment docs.

Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updates!

@jorisvandenbossche jorisvandenbossche merged commit 86684ad into pandas-dev:master Mar 22, 2018
@jorisvandenbossche
Copy link
Member

Thanks @acidburnburn !

@jorisvandenbossche jorisvandenbossche added this to the 0.23.0 milestone Mar 22, 2018
javadnoorb pushed a commit to javadnoorb/pandas that referenced this pull request Mar 29, 2018
dworvos pushed a commit to dworvos/pandas that referenced this pull request Apr 2, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Docs IO HDF5 read_hdf, HDFStore
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants