Skip to content

DOC: update the to_pickle & read_pickle docstring #20253

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 30 commits into from
Mar 14, 2018

Conversation

minggli
Copy link
Contributor

@minggli minggli commented Mar 10, 2018

Checklist for the pandas documentation sprint (ignore this if you are doing
an unrelated PR):

  • PR title is "DOC: update the docstring"
  • The validation script passes: scripts/validate_docstrings.py <your-function-or-method>
  • The PEP8 style check passes: git diff upstream/master -u -- "*.py" | flake8 --diff
  • The html version looks good: python doc/make.py --single <your-function-or-method>
  • It has been proofread on language by another sprint participant

Please include the output of the validation script below between the "```" ticks:

################################################################################
#################### Docstring (pandas.DataFrame.to_pickle) ####################
################################################################################

Pickle (serialize) object to input file path.

Parameters
----------
path : str
    File path where the pickled object will be stored.
compression : {'infer', 'gzip', 'bz2', 'xz', None}, default 'infer'
    A string representing the compression to use in the output file. By
    default, infers from the file extension in specified path.

    .. versionadded:: 0.20.0
protocol : int
    Int which indicates which protocol should be used by the pickler,
    default HIGHEST_PROTOCOL (see [1], paragraph 12.1.2). The possible
    values for this parameter depend on the version of Python. For
    Python 2.x, possible values are 0, 1, 2. For Python>=3.0, 3 is a
    valid value. For Python >= 3.4, 4 is a valid value. A negative
    value for the protocol parameter is equivalent to setting its value
    to HIGHEST_PROTOCOL.

    .. [1] https://docs.python.org/3/library/pickle.html
    .. versionadded:: 0.21.0

See Also
--------
read_pickle : Load pickled pandas object (or any object) from file.
DataFrame.to_hdf : Write the contained data to an HDF5 file using
    HDFStore.
DataFrame.to_sql : Write records stored in a DataFrame to a SQL
    database.
DataFrame.to_parquet : Write a DataFrame to the binary parquet format.

Examples
--------
>>> original_df = pd.DataFrame({"foo": range(5), "bar": range(5, 10)})
>>> original_df
   foo  bar
0    0    5
1    1    6
2    2    7
3    3    8
4    4    9
>>> original_df.to_pickle("./dummy.pkl")

>>> unpickled_df = pd.read_pickle("./dummy.pkl")
>>> unpickled_df
   foo  bar
0    0    5
1    1    6
2    2    7
3    3    8
4    4    9

>>> import os
>>> os.remove("./dummy.pkl")

################################################################################
################################## Validation ##################################
################################################################################

Errors found:
	No extended summary found
	Errors in parameters section
		Parameter "compression" description should finish with "."
		Parameter "protocol" description should finish with "."
	No returns section found

description already finishes with "."
.. versionadded:: tag should not end with "."

@minggli minggli changed the title DOC: update the to_pickle & 'read_pickle' docstring DOC: update the to_pickle & read_pickle docstring Mar 10, 2018

Examples
--------
>>> from pandas import DataFrame, read_pickle
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you don't need these imports, instead use pd.DataFrame and pd.read_pickle

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed 👍

>>> original_df
foo bar
0 0 5
1 1 6
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a See Also and ref read_pickle (to_hdf, to_sql, to_parquet) also good

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added 💯

@@ -72,6 +93,28 @@ def read_pickle(path, compression='infer'):
Returns
-------
unpickled : type of object stored in file
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a See Also and ref DataFrame.to_pickle, pd.read_hdf, pd.read_sql, pd.read_parquet

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added.

@jreback jreback added Docs IO Data IO issues that don't fit into a more specific label labels Mar 10, 2018
for the protocol parameter is equivalent to setting its value to
HIGHEST_PROTOCOL.

.. [1] https://docs.python.org/3/library/pickle.html
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we are putting these in a Reference section @jorisvandenbossche ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's the numpydoc way yes, but currently we have no docstring using that, so I would leave it like this for now

--------
pandas.DataFrame.to_pickle
pandas.Series.to_pickle
pandas.read_hdf,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't put commas at the end

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we add text describing each of these

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

corrected.

See Also
--------
pandas.read_pickle
pandas.to_hdf
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added.


See Also
--------
pandas.read_pickle
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add similar to what you adding in other See Also below

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added.

@minggli minggli closed this Mar 12, 2018
@minggli minggli reopened this Mar 12, 2018
Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR!

I am not sure that I find it worth to use a shared_doc system here. It's used only for substituting Series/DataFrame in the summary line and path parameter explanation. I don't think that's worth the added complexity. I would rather make the docstring generic for both. It's not that the Series version has different options or behaviours.

default HIGHEST_PROTOCOL (see [1], paragraph 12.1.2). The possible
values for this parameter depend on the version of Python. For
Python 2.x, possible values are 0, 1, 2. For Python>=3.0, 3 is a
valid value. For Python >= 3.4, 4 is a valid value.A negative value
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

space after '.'

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.


See Also
--------
pandas.read_pickle : Load pickled pandas object (or any other pickled
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can leave out the 'pandas' in all of those references

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed 👍

for the protocol parameter is equivalent to setting its value to
HIGHEST_PROTOCOL.

.. [1] https://docs.python.org/3/library/pickle.html
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's the numpydoc way yes, but currently we have no docstring using that, so I would leave it like this for now

path : string
File path where the pickled %(klass)s object will be stored.
compression : {'infer', 'gzip', 'bz2', 'xz', None}, default 'infer'
A string representing the compression to use in the output file.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add that the default 'infer' infers it from the specified path?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added. 💯

@codecov
Copy link

codecov bot commented Mar 12, 2018

Codecov Report

❗ No coverage uploaded for pull request base (master@fb556ed). Click here to learn what that means.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff            @@
##             master   #20253   +/-   ##
=========================================
  Coverage          ?   91.76%           
=========================================
  Files             ?      150           
  Lines             ?    49148           
  Branches          ?        0           
=========================================
  Hits              ?    45099           
  Misses            ?     4049           
  Partials          ?        0
Flag Coverage Δ
#multiple 90.14% <100%> (?)
#single 41.9% <100%> (?)
Impacted Files Coverage Δ
pandas/core/generic.py 95.85% <100%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update fb556ed...26b3e2e. Read the comment docs.

@minggli
Copy link
Contributor Author

minggli commented Mar 12, 2018

@jorisvandenbossche thanks for the comments. applied changes including reverting to generic docstring. agree that for the benefits, not necessary to use _shared_docs.

@minggli minggli closed this Mar 13, 2018
@minggli minggli reopened this Mar 13, 2018
4 4 9
>>> original_df.to_pickle("./dummy.pkl")

>>> unpickled_df = pd.read_pickle("./dummy.pkl")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jorisvandenbossche how handling file paths in doc-strings?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jreback commented about it (using os.remove to remove remaining file), issue to discuss this is here: #20302

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great, see that now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks guys for looking into it.

@jreback jreback added this to the 0.23.0 milestone Mar 13, 2018
@jreback
Copy link
Contributor

jreback commented Mar 13, 2018

lgtm. @jorisvandenbossche @datapythonista

Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updates! Looking good

Added some minor additional comments. One more general comment: it might be nice to add an extended summary very briefly explaining what "pickle" is and linking to the python docs about it.

2 2 7
3 3 8
4 4 9

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add here a

>>> import os
>>> os.remove("./dummy.pkl")

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added. :)

See Also
--------
read_pickle : Load pickled pandas object (or any other pickled object)
from the specified file path.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you only indent this line with 4 spaces? Like

read_pickle : Load pickled pandas object (or any other pickled object)
    from the specified file path.

Both work for sphinx, but we mainly use this pattern, so it's better to be consistent in this.

(same for the other ones below)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

@@ -52,15 +81,17 @@ def to_pickle(obj, path, compression='infer', protocol=pkl.HIGHEST_PROTOCOL):
def read_pickle(path, compression='infer'):
"""
Load pickled pandas object (or any other pickled object) from the specified
file path
file path.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you try to fit this on one line?

Maybe the "or any other pickled object" is not needed in the summary line and can go in an extended summary, as the typical use case should be pandas objects.
Or, maybe the "from specified file path" can be shortened to "from file."

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shortened.

Copy link
Member

@datapythonista datapythonista left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, added some comments.

@@ -1906,23 +1906,54 @@ def to_pickle(self, path, compression='infer',
Parameters
----------
path : string
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

str instead of string

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done 👍

compression : {'infer', 'gzip', 'bz2', 'xz', None}, default 'infer'
a string representing the compression to use in the output file
A string representing the compression to use in the output file. By
default, infers from the specified path.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"infers from the specified file extension" may be?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes pythonista.

HIGHEST_PROTOCOL.
valid value. For Python >= 3.4, 4 is a valid value. A negative
value for the protocol parameter is equivalent to setting its value
to HIGHEST_PROTOCOL.

.. [1] https://docs.python.org/3/library/pickle.html
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think is more standard to have this in a References section.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jorisvandenbossche your opinion?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I previously said it was fine, because we don't use References sections in many cases (most of the time we use inline links), another thing we can discuss in further improving the guidelines.

3 3 8
4 4 9

See Also
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See Also should go before the Examples.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

corrected.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you push your latest changes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just because you mentioned you corrected it, but the see also is still after the examples?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh sorry, just missed this one and did others.

Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some remaining comment about the "See Also" descriptions.
Looking good for the rest!

See Also
--------
read_pickle : Load pickled pandas object (or any object) from file.
DataFrame.to_hdf : Write the contained data to an HDF5 file using HDFStore.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would make this a bit simpler (eg no need to mention HDFStore): "Write DataFame to HDF5 file"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

amended 👍

--------
read_pickle : Load pickled pandas object (or any object) from file.
DataFrame.to_hdf : Write the contained data to an HDF5 file using HDFStore.
DataFrame.to_sql : Write records stored in a DataFrame to a SQL database.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be a bit more consistent in the wording here, maybe just "Write DataFrame to a SQL database" ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

amended 👍

DataFrame.to_pickle : Pickle (serialize) DataFrame object to input file
path.
Series.to_pickle : Pickle (serialize) Series object to input file path.
read_hdf : read from the store, close it if we opened it.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the explanation about closing the file is too detailed for the See Also explanation. Maybe something like "Read HDF5 file into a DataFrame" ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

amended 👍

Series.to_pickle : Pickle (serialize) Series object to input file path.
read_hdf : read from the store, close it if we opened it.
read_sql : Read SQL query or database table into a DataFrame.
read_parquet : Load a parquet object from the file path, returning a
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar comment here, "from the file path" is rather specific, would leave out that part

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

amended 👍

@@ -10,15 +10,17 @@

def to_pickle(obj, path, compression='infer', protocol=pkl.HIGHEST_PROTOCOL):
"""
Pickle (serialize) object to input file path
Pickle (serialize) object to input file path.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find it a bit strange to say we pickle the object the a "file path", it's actually to a file (located at the file path), so maybe simplify to "Pickle (serialize) object to file." ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

amended 👍

@jreback
Copy link
Contributor

jreback commented Mar 13, 2018

this lgtm. do we use Reference in other doc-strings?

@jorisvandenbossche jorisvandenbossche merged commit 12de69b into pandas-dev:master Mar 14, 2018
@jorisvandenbossche
Copy link
Member

@minggli Thanks a lot for the PR!

@minggli minggli deleted the doc/to_pickle branch March 14, 2018 09:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Docs IO Data IO issues that don't fit into a more specific label
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants