Skip to content

DOC: update the to_pickle & read_pickle docstring #20253

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 30 commits into from
Mar 14, 2018
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
30 commits
Select commit Hold shift + click to select a range
3d6ed4a
update docstring and add example
minggli Mar 10, 2018
62a202f
update docstring and add example
minggli Mar 10, 2018
03fa85b
add space
minggli Mar 10, 2018
3169811
DataFrame.to_pickle docstring
minggli Mar 10, 2018
d014e97
Series.to_pickle docstring
minggli Mar 10, 2018
202f411
add to_pickle to _shared_docs
minggli Mar 10, 2018
05e1bce
move quote
minggli Mar 10, 2018
f674845
remove blank line
minggli Mar 10, 2018
556adf4
miscellaneous fixes
minggli Mar 10, 2018
42fcc03
miscellaneous fixes
minggli Mar 10, 2018
e69ea5c
remove import and add See Also
minggli Mar 10, 2018
f36c6dd
remove import and add See Also
minggli Mar 10, 2018
5f152ed
add more See Also
minggli Mar 10, 2018
c6231b0
use proper warning with embedded hyperlink
minggli Mar 11, 2018
0c3a442
remove pandas.to_pickle from See Also
minggli Mar 11, 2018
709ca74
remove commas in See Also
minggli Mar 11, 2018
c15d454
additional output in See Also
minggli Mar 11, 2018
b3d9cee
add descriptions in See Also references
minggli Mar 11, 2018
ef19c93
add descriptions in See Also references
minggli Mar 11, 2018
33a9b1f
correct references and indentation
minggli Mar 11, 2018
7be8f3b
correct indentation
minggli Mar 11, 2018
d69c73f
revert frame
minggli Mar 12, 2018
e2af5a3
revert series
minggli Mar 12, 2018
46b7342
remove shared_doc, pandas. and add infer description
minggli Mar 12, 2018
7f1d3d4
remove pandas. and add infer description
minggli Mar 12, 2018
3e545f3
miscellaneous changes
minggli Mar 13, 2018
c1d6f03
miscellaneous changes
minggli Mar 13, 2018
39969d5
move See Also before Example and add os.remove
minggli Mar 13, 2018
c1a9d57
simplify See Also and to_pickle summary.
minggli Mar 13, 2018
26b3e2e
simplify See Also and to_pickle summary in pandas.io.pickle
minggli Mar 13, 2018
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
45 changes: 38 additions & 7 deletions pandas/core/generic.py
Original file line number Diff line number Diff line change
Expand Up @@ -1901,28 +1901,59 @@ def to_sql(self, name, con, schema=None, if_exists='fail', index=True,
def to_pickle(self, path, compression='infer',
protocol=pkl.HIGHEST_PROTOCOL):
"""
Pickle (serialize) object to input file path.
Pickle (serialize) object to file.

Parameters
----------
path : string
File path
path : str
File path where the pickled object will be stored.
compression : {'infer', 'gzip', 'bz2', 'xz', None}, default 'infer'
a string representing the compression to use in the output file
A string representing the compression to use in the output file. By
default, infers from the file extension in specified path.

.. versionadded:: 0.20.0
protocol : int
Int which indicates which protocol should be used by the pickler,
default HIGHEST_PROTOCOL (see [1], paragraph 12.1.2). The possible
values for this parameter depend on the version of Python. For
Python 2.x, possible values are 0, 1, 2. For Python>=3.0, 3 is a
valid value. For Python >= 3.4, 4 is a valid value.A negative value
for the protocol parameter is equivalent to setting its value to
HIGHEST_PROTOCOL.
valid value. For Python >= 3.4, 4 is a valid value. A negative
value for the protocol parameter is equivalent to setting its value
to HIGHEST_PROTOCOL.

.. [1] https://docs.python.org/3/library/pickle.html
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think is more standard to have this in a References section.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jorisvandenbossche your opinion?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I previously said it was fine, because we don't use References sections in many cases (most of the time we use inline links), another thing we can discuss in further improving the guidelines.

.. versionadded:: 0.21.0

See Also
--------
read_pickle : Load pickled pandas object (or any object) from file.
DataFrame.to_hdf : Write DataFrame to an HDF5 file.
DataFrame.to_sql : Write DataFrame to a SQL database.
DataFrame.to_parquet : Write a DataFrame to the binary parquet format.

Examples
--------
>>> original_df = pd.DataFrame({"foo": range(5), "bar": range(5, 10)})
>>> original_df
foo bar
0 0 5
1 1 6
2 2 7
3 3 8
4 4 9
>>> original_df.to_pickle("./dummy.pkl")

>>> unpickled_df = pd.read_pickle("./dummy.pkl")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jorisvandenbossche how handling file paths in doc-strings?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jreback commented about it (using os.remove to remove remaining file), issue to discuss this is here: #20302

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great, see that now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks guys for looking into it.

>>> unpickled_df
foo bar
0 0 5
1 1 6
2 2 7
3 3 8
4 4 9

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add here a

>>> import os
>>> os.remove("./dummy.pkl")

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added. :)

>>> import os
>>> os.remove("./dummy.pkl")
"""
from pandas.io.pickle import to_pickle
return to_pickle(self, path, compression=compression,
Expand Down
86 changes: 75 additions & 11 deletions pandas/io/pickle.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,15 +10,17 @@

def to_pickle(obj, path, compression='infer', protocol=pkl.HIGHEST_PROTOCOL):
"""
Pickle (serialize) object to input file path
Pickle (serialize) object to file.
Parameters
----------
obj : any object
path : string
File path
Any python object.
path : str
File path where the pickled object will be stored.
compression : {'infer', 'gzip', 'bz2', 'xz', None}, default 'infer'
a string representing the compression to use in the output file
A string representing the compression to use in the output file. By
default, infers from the file extension in specified path.
.. versionadded:: 0.20.0
protocol : int
Expand All @@ -33,7 +35,36 @@ def to_pickle(obj, path, compression='infer', protocol=pkl.HIGHEST_PROTOCOL):
.. [1] https://docs.python.org/3/library/pickle.html
.. versionadded:: 0.21.0
See Also
--------
read_pickle : Load pickled pandas object (or any object) from file.
DataFrame.to_hdf : Write DataFrame to an HDF5 file.
DataFrame.to_sql : Write DataFrame to a SQL database.
DataFrame.to_parquet : Write a DataFrame to the binary parquet format.
Examples
--------
>>> original_df = pd.DataFrame({"foo": range(5), "bar": range(5, 10)})
>>> original_df
foo bar
0 0 5
1 1 6
2 2 7
3 3 8
4 4 9
>>> pd.to_pickle(original_df, "./dummy.pkl")
>>> unpickled_df = pd.read_pickle("./dummy.pkl")
>>> unpickled_df
foo bar
0 0 5
1 1 6
2 2 7
3 3 8
4 4 9
>>> import os
>>> os.remove("./dummy.pkl")
"""
path = _stringify_path(path)
inferred_compression = _infer_compression(path, compression)
Expand All @@ -51,16 +82,17 @@ def to_pickle(obj, path, compression='infer', protocol=pkl.HIGHEST_PROTOCOL):

def read_pickle(path, compression='infer'):
"""
Load pickled pandas object (or any other pickled object) from the specified
file path
Load pickled pandas object (or any object) from file.
.. warning::
Warning: Loading pickled data received from untrusted sources can be
unsafe. See: https://docs.python.org/3/library/pickle.html
Loading pickled data received from untrusted sources can be
unsafe. See `here <https://docs.python.org/3/library/pickle.html>`__.
Parameters
----------
path : string
File path
path : str
File path where the pickled object will be loaded.
compression : {'infer', 'gzip', 'bz2', 'xz', 'zip', None}, default 'infer'
For on-the-fly decompression of on-disk data. If 'infer', then use
gzip, bz2, xz or zip if path ends in '.gz', '.bz2', '.xz',
Expand All @@ -72,6 +104,38 @@ def read_pickle(path, compression='infer'):
Returns
-------
unpickled : type of object stored in file
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a See Also and ref DataFrame.to_pickle, pd.read_hdf, pd.read_sql, pd.read_parquet

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added.

See Also
--------
DataFrame.to_pickle : Pickle (serialize) DataFrame object to file.
Series.to_pickle : Pickle (serialize) Series object to file.
read_hdf : Read HDF5 file into a DataFrame.
read_sql : Read SQL query or database table into a DataFrame.
read_parquet : Load a parquet object, returning a DataFrame.
Examples
--------
>>> original_df = pd.DataFrame({"foo": range(5), "bar": range(5, 10)})
>>> original_df
foo bar
0 0 5
1 1 6
2 2 7
3 3 8
4 4 9
>>> pd.to_pickle(original_df, "./dummy.pkl")
>>> unpickled_df = pd.read_pickle("./dummy.pkl")
>>> unpickled_df
foo bar
0 0 5
1 1 6
2 2 7
3 3 8
4 4 9
>>> import os
>>> os.remove("./dummy.pkl")
"""
path = _stringify_path(path)
inferred_compression = _infer_compression(path, compression)
Expand Down