DOC: update the parquet docstring #20129

benman1 · 2018-03-10T13:36:43Z

Checklist for the pandas documentation sprint (ignore this if you are doing
an unrelated PR):

PR title is "DOC: update the docstring"
The validation script passes: scripts/validate_docstrings.py <your-function-or-method>
The PEP8 style check passes: git diff upstream/master -u -- "*.py" | flake8 --diff
The html version looks good: python doc/make.py --single <your-function-or-method>
It has been proofread on language by another sprint participant

Please include the output of the validation script below between the "```" ticks:

################################################################################
################### Docstring (pandas.DataFrame.to_parquet)  ###################
################################################################################

Write a DataFrame to the binary parquet format.

.. versionadded:: 0.21.0

Requires either fastparquet or pyarrow libraries.

Parameters
----------
fname : str
    String file path.
engine : {'auto', 'pyarrow', 'fastparquet'}, default 'auto'
    Parquet library to use. If 'auto', then the option
    ``io.parquet.engine`` is used. The default ``io.parquet.engine``
    behavior is to try 'pyarrow', falling back to 'fastparquet' if
    'pyarrow' is unavailable.
compression : {'snappy', 'gzip', 'brotli', None}, default 'snappy'
    Name of the compression to use. Use ``None`` for no compression.
kwargs : dict
    Additional keyword arguments passed to the engine.

Examples
----------
>>> df = pd.DataFrame(data={'col1': [1, 2], 'col2': [3, 4]})
>>> df.to_parquet('df.parquet.gzip', compression='gzip')

Returns
----------
None

See Also
--------
DataFrame.to_csv : write a csv file.
DataFrame.to_sql : write to a sql table.
DataFrame.to_hdf : write to hdf.

################################################################################
################################## Validation ##################################
################################################################################

Docstring for "pandas.DataFrame.to_parquet" correct. :)

arnau126 · 2018-03-10T13:54:01Z

pandas/core/frame.py

@@ -1697,19 +1697,36 @@ def to_parquet(self, fname, engine='auto', compression='snappy',

        .. versionadded:: 0.21.0

+        Requires either fastparquet or pyarrow libraries.


I think that it should go in the Notes section.

arnau126 · 2018-03-10T13:54:51Z

pandas/core/frame.py

@@ -1697,19 +1697,36 @@ def to_parquet(self, fname, engine='auto', compression='snappy',



An extended summary is necessary.

arnau126 · 2018-03-10T13:55:34Z

pandas/core/frame.py

+            Additional keyword arguments passed to the engine.
+
+        Examples
+        ----------


There are to many hyphens.

arnau126 · 2018-03-10T13:56:18Z

pandas/core/frame.py

+        >>> df.to_parquet('df.parquet.gzip', compression='gzip')
+
+        Returns
+        ----------


There are to many hyphens.

cemsbr · 2018-03-10T14:03:48Z

pandas/core/frame.py

@@ -1697,19 +1697,36 @@ def to_parquet(self, fname, engine='auto', compression='snappy',

        .. versionadded:: 0.21.0

+        Requires either fastparquet or pyarrow libraries.


Third-persion verbs should not be used, as stated in docs guide. There are 2 options here: either use a subject like "This function requires" or begin with the infinite "Require..."

I'm not 100% sure, but I have a feeling that "libraries" should be in singular the form "library".

cemsbr · 2018-03-10T14:03:51Z

pandas/core/frame.py

+        --------
+        DataFrame.to_csv : write a csv file.
+        DataFrame.to_sql : write to a sql table.
+        DataFrame.to_hdf : write to hdf.


For the "See Also" section, please check the section 5 of https://python-sprints.github.io/pandas/guide/pandas_docstring.html

jreback · 2018-03-10T15:43:27Z

pandas/core/frame.py

@@ -1697,19 +1697,42 @@ def to_parquet(self, fname, engine='auto', compression='snappy',

        .. versionadded:: 0.21.0

+        This function writes the dataframe as a parquet file. You


can you add a link to the parquet docs (copy from io.rst) in References

jreback · 2018-03-10T15:43:40Z

pandas/core/frame.py

        Parameters
        ----------
        fname : str
-            string file path
+            String file path.


IIRC we use lower for these? @jorisvandenbossche

Here it is fine as start of the sentence and not being the exact type

jreback · 2018-03-10T15:43:55Z

pandas/core/frame.py

+            Additional keyword arguments passed to the engine.
+
+        Returns
+        ----------


make the underlines the same length as the title

jreback · 2018-03-10T15:44:05Z

pandas/core/frame.py

+        DataFrame.to_sql : write to a sql table.
+        DataFrame.to_hdf : write to hdf.
+
+        Notes


jreback · 2018-03-10T15:44:27Z

pandas/core/frame.py

+
+        See Also
+        --------
+        DataFrame.to_csv : write a csv file.


link to pd.read_parquet

jorisvandenbossche · 2018-03-10T16:09:06Z

pandas/core/frame.py

        engine : {'auto', 'pyarrow', 'fastparquet'}, default 'auto'
            Parquet library to use. If 'auto', then the option
            ``io.parquet.engine`` is used. The default ``io.parquet.engine``
            behavior is to try 'pyarrow', falling back to 'fastparquet' if
            'pyarrow' is unavailable.
        compression : {'snappy', 'gzip', 'brotli', None}, default 'snappy'
            Name of the compression to use. Use ``None`` for no compression.
-        kwargs
-            Additional keyword arguments passed to the engine
+        kwargs : dict


can you change this line to **kwargs (without the dict, as you actually cannot pass it as a dict)

Writing it as **kwargs results in a validation error:

Errors found: Errors in parameters section Parameters {'kwargs'} not documented Unknown parameters {'**kwargs'} Parameter "**kwargs" has no type

You can ignore that error (the validation script is not yet perfect here :-))

Thanks for clarifying!

jorisvandenbossche · 2018-03-10T18:49:46Z

pandas/core/frame.py

-            Additional keyword arguments passed to the engine
+        kwargs : dict
+            Additional keyword arguments passed to the parquet library. See
+            the documentation for :func:`pandas.io.parquet.to_parquet` for


can you change this link to :ref:`io.parquet` (then it will link to the user guide with more information, the link you added now is to the docstring of to_parquet)

jorisvandenbossche · 2018-03-10T18:50:32Z

pandas/core/frame.py

+
+        Notes
+        -----
+        This function requires either the fastparquet or pyarrow library.


Can you make fastparquet and pyarrow into links to their home page?

The syntax is `pyarrow <https:// ...>`__

pep8speaks · 2018-03-10T19:43:49Z

Hello @benman1! Thanks for updating the PR.

Cheers ! There are no PEP8 issues in this Pull Request. 🍻

Comment last updated on March 12, 2018 at 11:03 Hours UTC

jorisvandenbossche · 2018-03-10T20:06:06Z

@benman1 I removed the Returns section, as I think it is not needed for the write functions (and we can ignore the error in the validation script)

jreback · 2018-03-11T14:12:15Z

pandas/core/frame.py

+        Examples
+        --------
+        >>> df = pd.DataFrame(data={'col1': [1, 2], 'col2': [3, 4]})
+        >>> df.to_parquet('df.parquet.gzip', compression='gzip')


can you also show using read_parquet to read this back

jorisvandenbossche · 2018-03-12T11:04:01Z

Did the small edit before merging.
@benman1 Thanks a lot!

DOC: update the parquet docstring

6d25e6c

arnau126 reviewed Mar 10, 2018

View reviewed changes

pandas/core/frame.py Outdated

Additional keyword arguments passed to the engine.

Examples

----------

Copy link

arnau126 Mar 10, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are to many hyphens.

arnau126 reviewed Mar 10, 2018

View reviewed changes

pandas/core/frame.py Outdated

>>> df.to_parquet('df.parquet.gzip', compression='gzip')

Returns

----------

Copy link

arnau126 Mar 10, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are to many hyphens.

cemsbr suggested changes Mar 10, 2018

View reviewed changes

update order of sections; return type fixed

14ed668

jreback requested changes Mar 10, 2018

View reviewed changes

jreback added Docs IO Parquet parquet, feather labels Mar 10, 2018

jorisvandenbossche reviewed Mar 10, 2018

View reviewed changes

fix issues as suggested in discussion of pull request

7b7f8bc

jorisvandenbossche reviewed Mar 10, 2018

View reviewed changes

ignore validation error for **kwargs argument as per jorisvandenbossche

7ec8537

benman1 and others added 3 commits March 10, 2018 19:45

update links as per feedback on pull request

39423cd

fix pep8 long line

d08f6b2

remove return section

b5329d2

jorisvandenbossche approved these changes Mar 10, 2018

View reviewed changes

jorisvandenbossche added this to the 0.23.0 milestone Mar 11, 2018

jreback requested changes Mar 11, 2018

View reviewed changes

cemsbr approved these changes Mar 11, 2018

View reviewed changes

Update frame.py

e65840c

jorisvandenbossche merged commit 0ae89b0 into pandas-dev:master Mar 12, 2018

pandas-dev deleted a comment from codecov bot Mar 12, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DOC: update the parquet docstring #20129

DOC: update the parquet docstring #20129

benman1 commented Mar 10, 2018 •

edited

Loading

arnau126 Mar 10, 2018

arnau126 Mar 10, 2018

arnau126 Mar 10, 2018

arnau126 Mar 10, 2018

cemsbr Mar 10, 2018

cemsbr Mar 10, 2018

jreback Mar 10, 2018

jreback Mar 10, 2018

jorisvandenbossche Mar 10, 2018

jreback Mar 10, 2018

jreback Mar 10, 2018

jreback Mar 10, 2018

jorisvandenbossche Mar 10, 2018

benman1 Mar 10, 2018

jorisvandenbossche Mar 10, 2018

benman1 Mar 10, 2018

jorisvandenbossche Mar 10, 2018

jorisvandenbossche Mar 10, 2018

pep8speaks commented Mar 10, 2018 •

edited

Loading

jorisvandenbossche commented Mar 10, 2018

jreback Mar 11, 2018

jorisvandenbossche commented Mar 12, 2018

		@@ -1697,19 +1697,36 @@ def to_parquet(self, fname, engine='auto', compression='snappy',

		.. versionadded:: 0.21.0

		Requires either fastparquet or pyarrow libraries.

		@@ -1697,19 +1697,42 @@ def to_parquet(self, fname, engine='auto', compression='snappy',

		.. versionadded:: 0.21.0

		This function writes the dataframe as a parquet file. You

DOC: update the parquet docstring #20129

DOC: update the parquet docstring #20129

Conversation

benman1 commented Mar 10, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pep8speaks commented Mar 10, 2018 • edited Loading

Comment last updated on March 12, 2018 at 11:03 Hours UTC

jorisvandenbossche commented Mar 10, 2018

Choose a reason for hiding this comment

jorisvandenbossche commented Mar 12, 2018

benman1 commented Mar 10, 2018 •

edited

Loading

pep8speaks commented Mar 10, 2018 •

edited

Loading