Skip to content

DOC: update the to_pickle & read_pickle docstring #20253

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 30 commits into from
Mar 14, 2018
Merged
Show file tree
Hide file tree
Changes from 25 commits
Commits
Show all changes
30 commits
Select commit Hold shift + click to select a range
3d6ed4a
update docstring and add example
minggli Mar 10, 2018
62a202f
update docstring and add example
minggli Mar 10, 2018
03fa85b
add space
minggli Mar 10, 2018
3169811
DataFrame.to_pickle docstring
minggli Mar 10, 2018
d014e97
Series.to_pickle docstring
minggli Mar 10, 2018
202f411
add to_pickle to _shared_docs
minggli Mar 10, 2018
05e1bce
move quote
minggli Mar 10, 2018
f674845
remove blank line
minggli Mar 10, 2018
556adf4
miscellaneous fixes
minggli Mar 10, 2018
42fcc03
miscellaneous fixes
minggli Mar 10, 2018
e69ea5c
remove import and add See Also
minggli Mar 10, 2018
f36c6dd
remove import and add See Also
minggli Mar 10, 2018
5f152ed
add more See Also
minggli Mar 10, 2018
c6231b0
use proper warning with embedded hyperlink
minggli Mar 11, 2018
0c3a442
remove pandas.to_pickle from See Also
minggli Mar 11, 2018
709ca74
remove commas in See Also
minggli Mar 11, 2018
c15d454
additional output in See Also
minggli Mar 11, 2018
b3d9cee
add descriptions in See Also references
minggli Mar 11, 2018
ef19c93
add descriptions in See Also references
minggli Mar 11, 2018
33a9b1f
correct references and indentation
minggli Mar 11, 2018
7be8f3b
correct indentation
minggli Mar 11, 2018
d69c73f
revert frame
minggli Mar 12, 2018
e2af5a3
revert series
minggli Mar 12, 2018
46b7342
remove shared_doc, pandas. and add infer description
minggli Mar 12, 2018
7f1d3d4
remove pandas. and add infer description
minggli Mar 12, 2018
3e545f3
miscellaneous changes
minggli Mar 13, 2018
c1d6f03
miscellaneous changes
minggli Mar 13, 2018
39969d5
move See Also before Example and add os.remove
minggli Mar 13, 2018
c1a9d57
simplify See Also and to_pickle summary.
minggli Mar 13, 2018
26b3e2e
simplify See Also and to_pickle summary in pandas.io.pickle
minggli Mar 13, 2018
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
41 changes: 36 additions & 5 deletions pandas/core/generic.py
Original file line number Diff line number Diff line change
Expand Up @@ -1906,23 +1906,54 @@ def to_pickle(self, path, compression='infer',
Parameters
----------
path : string
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

str instead of string

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done 👍

File path
File path where the pickled object will be stored.
compression : {'infer', 'gzip', 'bz2', 'xz', None}, default 'infer'
a string representing the compression to use in the output file
A string representing the compression to use in the output file. By
default, infers from the specified path.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"infers from the specified file extension" may be?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes pythonista.


.. versionadded:: 0.20.0
protocol : int
Int which indicates which protocol should be used by the pickler,
default HIGHEST_PROTOCOL (see [1], paragraph 12.1.2). The possible
values for this parameter depend on the version of Python. For
Python 2.x, possible values are 0, 1, 2. For Python>=3.0, 3 is a
valid value. For Python >= 3.4, 4 is a valid value.A negative value
for the protocol parameter is equivalent to setting its value to
HIGHEST_PROTOCOL.
valid value. For Python >= 3.4, 4 is a valid value. A negative
value for the protocol parameter is equivalent to setting its value
to HIGHEST_PROTOCOL.

.. [1] https://docs.python.org/3/library/pickle.html
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think is more standard to have this in a References section.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jorisvandenbossche your opinion?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I previously said it was fine, because we don't use References sections in many cases (most of the time we use inline links), another thing we can discuss in further improving the guidelines.

.. versionadded:: 0.21.0

Examples
--------
>>> original_df = pd.DataFrame({"foo": range(5), "bar": range(5, 10)})
>>> original_df
foo bar
0 0 5
1 1 6
2 2 7
3 3 8
4 4 9
>>> original_df.to_pickle("./dummy.pkl")

>>> unpickled_df = pd.read_pickle("./dummy.pkl")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jorisvandenbossche how handling file paths in doc-strings?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jreback commented about it (using os.remove to remove remaining file), issue to discuss this is here: #20302

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great, see that now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks guys for looking into it.

>>> unpickled_df
foo bar
0 0 5
1 1 6
2 2 7
3 3 8
4 4 9

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add here a

>>> import os
>>> os.remove("./dummy.pkl")

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added. :)

See Also
--------
read_pickle : Load pickled pandas object (or any other pickled object)
from the specified file path.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you only indent this line with 4 spaces? Like

read_pickle : Load pickled pandas object (or any other pickled object)
    from the specified file path.

Both work for sphinx, but we mainly use this pattern, so it's better to be consistent in this.

(same for the other ones below)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

DataFrame.to_hdf : Write the contained data to an HDF5 file using
HDFStore.
DataFrame.to_sql : Write records stored in a DataFrame to a SQL
database.
DataFrame.to_parquet : Write a DataFrame to the binary parquet format.
"""
from pandas.io.pickle import to_pickle
return to_pickle(self, path, compression=compression,
Expand Down
78 changes: 70 additions & 8 deletions pandas/io/pickle.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,15 +10,17 @@

def to_pickle(obj, path, compression='infer', protocol=pkl.HIGHEST_PROTOCOL):
"""
Pickle (serialize) object to input file path
Pickle (serialize) object to input file path.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find it a bit strange to say we pickle the object the a "file path", it's actually to a file (located at the file path), so maybe simplify to "Pickle (serialize) object to file." ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

amended 👍


Parameters
----------
obj : any object
Any python object.
path : string
File path
File path where the pickled object will be stored.
compression : {'infer', 'gzip', 'bz2', 'xz', None}, default 'infer'
a string representing the compression to use in the output file
A string representing the compression to use in the output file. By
default, infers from the specified path.

.. versionadded:: 0.20.0
protocol : int
Expand All @@ -33,7 +35,34 @@ def to_pickle(obj, path, compression='infer', protocol=pkl.HIGHEST_PROTOCOL):
.. [1] https://docs.python.org/3/library/pickle.html
.. versionadded:: 0.21.0


Examples
--------
>>> original_df = pd.DataFrame({"foo": range(5), "bar": range(5, 10)})
>>> original_df
foo bar
0 0 5
1 1 6
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a See Also and ref read_pickle (to_hdf, to_sql, to_parquet) also good

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added 💯

2 2 7
3 3 8
4 4 9
>>> pd.to_pickle(original_df, "./dummy.pkl")

>>> unpickled_df = pd.read_pickle("./dummy.pkl")
>>> unpickled_df
foo bar
0 0 5
1 1 6
2 2 7
3 3 8
4 4 9

See Also
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See Also should go before the Examples.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

corrected.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you push your latest changes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just because you mentioned you corrected it, but the see also is still after the examples?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh sorry, just missed this one and did others.

--------
read_pickle : Load pickled pandas object (or any other pickled object) from
the specified file path.
DataFrame.to_hdf : Write the contained data to an HDF5 file using HDFStore.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would make this a bit simpler (eg no need to mention HDFStore): "Write DataFame to HDF5 file"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

amended 👍

DataFrame.to_sql : Write records stored in a DataFrame to a SQL database.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be a bit more consistent in the wording here, maybe just "Write DataFrame to a SQL database" ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

amended 👍

DataFrame.to_parquet : Write a DataFrame to the binary parquet format.
"""
path = _stringify_path(path)
inferred_compression = _infer_compression(path, compression)
Expand All @@ -52,15 +81,17 @@ def to_pickle(obj, path, compression='infer', protocol=pkl.HIGHEST_PROTOCOL):
def read_pickle(path, compression='infer'):
"""
Load pickled pandas object (or any other pickled object) from the specified
file path
file path.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you try to fit this on one line?

Maybe the "or any other pickled object" is not needed in the summary line and can go in an extended summary, as the typical use case should be pandas objects.
Or, maybe the "from specified file path" can be shortened to "from file."

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shortened.


.. warning::

Warning: Loading pickled data received from untrusted sources can be
unsafe. See: https://docs.python.org/3/library/pickle.html
Loading pickled data received from untrusted sources can be
unsafe. See `here <https://docs.python.org/3/library/pickle.html>`__.

Parameters
----------
path : string
File path
File path where the pickled object will be loaded.
compression : {'infer', 'gzip', 'bz2', 'xz', 'zip', None}, default 'infer'
For on-the-fly decompression of on-disk data. If 'infer', then use
gzip, bz2, xz or zip if path ends in '.gz', '.bz2', '.xz',
Expand All @@ -72,6 +103,37 @@ def read_pickle(path, compression='infer'):
Returns
-------
unpickled : type of object stored in file
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a See Also and ref DataFrame.to_pickle, pd.read_hdf, pd.read_sql, pd.read_parquet

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added.


Examples
--------
>>> original_df = pd.DataFrame({"foo": range(5), "bar": range(5, 10)})
>>> original_df
foo bar
0 0 5
1 1 6
2 2 7
3 3 8
4 4 9
>>> pd.to_pickle(original_df, "./dummy.pkl")

>>> unpickled_df = pd.read_pickle("./dummy.pkl")
>>> unpickled_df
foo bar
0 0 5
1 1 6
2 2 7
3 3 8
4 4 9

See Also
--------
DataFrame.to_pickle : Pickle (serialize) DataFrame object to input file
path.
Series.to_pickle : Pickle (serialize) Series object to input file path.
read_hdf : read from the store, close it if we opened it.
read_sql : Read SQL query or database table into a DataFrame.
read_parquet : Load a parquet object from the file path, returning a
DataFrame.
"""
path = _stringify_path(path)
inferred_compression = _infer_compression(path, compression)
Expand Down