Skip to content

DOC: update the parquet docstring #20129

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Mar 12, 2018

Conversation

benman1
Copy link
Contributor

@benman1 benman1 commented Mar 10, 2018

Checklist for the pandas documentation sprint (ignore this if you are doing
an unrelated PR):

  • PR title is "DOC: update the docstring"
  • The validation script passes: scripts/validate_docstrings.py <your-function-or-method>
  • The PEP8 style check passes: git diff upstream/master -u -- "*.py" | flake8 --diff
  • The html version looks good: python doc/make.py --single <your-function-or-method>
  • It has been proofread on language by another sprint participant

Please include the output of the validation script below between the "```" ticks:

################################################################################
################### Docstring (pandas.DataFrame.to_parquet)  ###################
################################################################################

Write a DataFrame to the binary parquet format.

.. versionadded:: 0.21.0

Requires either fastparquet or pyarrow libraries.

Parameters
----------
fname : str
    String file path.
engine : {'auto', 'pyarrow', 'fastparquet'}, default 'auto'
    Parquet library to use. If 'auto', then the option
    ``io.parquet.engine`` is used. The default ``io.parquet.engine``
    behavior is to try 'pyarrow', falling back to 'fastparquet' if
    'pyarrow' is unavailable.
compression : {'snappy', 'gzip', 'brotli', None}, default 'snappy'
    Name of the compression to use. Use ``None`` for no compression.
kwargs : dict
    Additional keyword arguments passed to the engine.

Examples
----------
>>> df = pd.DataFrame(data={'col1': [1, 2], 'col2': [3, 4]})
>>> df.to_parquet('df.parquet.gzip', compression='gzip')

Returns
----------
None

See Also
--------
DataFrame.to_csv : write a csv file.
DataFrame.to_sql : write to a sql table.
DataFrame.to_hdf : write to hdf.

################################################################################
################################## Validation ##################################
################################################################################

Docstring for "pandas.DataFrame.to_parquet" correct. :)

@@ -1697,19 +1697,36 @@ def to_parquet(self, fname, engine='auto', compression='snappy',

.. versionadded:: 0.21.0

Requires either fastparquet or pyarrow libraries.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that it should go in the Notes section.

@@ -1697,19 +1697,36 @@ def to_parquet(self, fname, engine='auto', compression='snappy',

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An extended summary is necessary.

Additional keyword arguments passed to the engine.

Examples
----------

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are to many hyphens.

>>> df.to_parquet('df.parquet.gzip', compression='gzip')

Returns
----------

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are to many hyphens.

@@ -1697,19 +1697,36 @@ def to_parquet(self, fname, engine='auto', compression='snappy',

.. versionadded:: 0.21.0

Requires either fastparquet or pyarrow libraries.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Third-persion verbs should not be used, as stated in docs guide. There are 2 options here: either use a subject like "This function requires" or begin with the infinite "Require..."

I'm not 100% sure, but I have a feeling that "libraries" should be in singular the form "library".

--------
DataFrame.to_csv : write a csv file.
DataFrame.to_sql : write to a sql table.
DataFrame.to_hdf : write to hdf.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the "See Also" section, please check the section 5 of https://python-sprints.github.io/pandas/guide/pandas_docstring.html

@@ -1697,19 +1697,42 @@ def to_parquet(self, fname, engine='auto', compression='snappy',

.. versionadded:: 0.21.0

This function writes the dataframe as a parquet file. You
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a link to the parquet docs (copy from io.rst) in References

Parameters
----------
fname : str
string file path
String file path.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC we use lower for these? @jorisvandenbossche

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here it is fine as start of the sentence and not being the exact type

Additional keyword arguments passed to the engine.

Returns
----------
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make the underlines the same length as the title

DataFrame.to_sql : write to a sql table.
DataFrame.to_hdf : write to hdf.

Notes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same


See Also
--------
DataFrame.to_csv : write a csv file.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

link to pd.read_parquet

@jreback jreback added Docs IO Parquet parquet, feather labels Mar 10, 2018
engine : {'auto', 'pyarrow', 'fastparquet'}, default 'auto'
Parquet library to use. If 'auto', then the option
``io.parquet.engine`` is used. The default ``io.parquet.engine``
behavior is to try 'pyarrow', falling back to 'fastparquet' if
'pyarrow' is unavailable.
compression : {'snappy', 'gzip', 'brotli', None}, default 'snappy'
Name of the compression to use. Use ``None`` for no compression.
kwargs
Additional keyword arguments passed to the engine
kwargs : dict
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you change this line to **kwargs (without the dict, as you actually cannot pass it as a dict)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Writing it as **kwargs results in a validation error:

Errors found:
	Errors in parameters section
		Parameters {'kwargs'} not documented
		Unknown parameters {'**kwargs'}
		Parameter "**kwargs" has no type

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can ignore that error (the validation script is not yet perfect here :-))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for clarifying!

Additional keyword arguments passed to the engine
kwargs : dict
Additional keyword arguments passed to the parquet library. See
the documentation for :func:`pandas.io.parquet.to_parquet` for
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you change this link to :ref:`io.parquet` (then it will link to the user guide with more information, the link you added now is to the docstring of to_parquet)


Notes
-----
This function requires either the fastparquet or pyarrow library.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you make fastparquet and pyarrow into links to their home page?

The syntax is `pyarrow <https:// ...>`__

@pep8speaks
Copy link

pep8speaks commented Mar 10, 2018

Hello @benman1! Thanks for updating the PR.

Cheers ! There are no PEP8 issues in this Pull Request. 🍻

Comment last updated on March 12, 2018 at 11:03 Hours UTC

@jorisvandenbossche
Copy link
Member

@benman1 I removed the Returns section, as I think it is not needed for the write functions (and we can ignore the error in the validation script)

@jorisvandenbossche jorisvandenbossche added this to the 0.23.0 milestone Mar 11, 2018
Examples
--------
>>> df = pd.DataFrame(data={'col1': [1, 2], 'col2': [3, 4]})
>>> df.to_parquet('df.parquet.gzip', compression='gzip')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you also show using read_parquet to read this back

@jorisvandenbossche jorisvandenbossche merged commit 0ae89b0 into pandas-dev:master Mar 12, 2018
@pandas-dev pandas-dev deleted a comment from codecov bot Mar 12, 2018
@jorisvandenbossche
Copy link
Member

Did the small edit before merging.
@benman1 Thanks a lot!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Docs IO Parquet parquet, feather
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants