Skip to content

DOC: update the to_pickle & read_pickle docstring #20253

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 30 commits into from
Mar 14, 2018
Merged
Show file tree
Hide file tree
Changes from 21 commits
Commits
Show all changes
30 commits
Select commit Hold shift + click to select a range
3d6ed4a
update docstring and add example
minggli Mar 10, 2018
62a202f
update docstring and add example
minggli Mar 10, 2018
03fa85b
add space
minggli Mar 10, 2018
3169811
DataFrame.to_pickle docstring
minggli Mar 10, 2018
d014e97
Series.to_pickle docstring
minggli Mar 10, 2018
202f411
add to_pickle to _shared_docs
minggli Mar 10, 2018
05e1bce
move quote
minggli Mar 10, 2018
f674845
remove blank line
minggli Mar 10, 2018
556adf4
miscellaneous fixes
minggli Mar 10, 2018
42fcc03
miscellaneous fixes
minggli Mar 10, 2018
e69ea5c
remove import and add See Also
minggli Mar 10, 2018
f36c6dd
remove import and add See Also
minggli Mar 10, 2018
5f152ed
add more See Also
minggli Mar 10, 2018
c6231b0
use proper warning with embedded hyperlink
minggli Mar 11, 2018
0c3a442
remove pandas.to_pickle from See Also
minggli Mar 11, 2018
709ca74
remove commas in See Also
minggli Mar 11, 2018
c15d454
additional output in See Also
minggli Mar 11, 2018
b3d9cee
add descriptions in See Also references
minggli Mar 11, 2018
ef19c93
add descriptions in See Also references
minggli Mar 11, 2018
33a9b1f
correct references and indentation
minggli Mar 11, 2018
7be8f3b
correct indentation
minggli Mar 11, 2018
d69c73f
revert frame
minggli Mar 12, 2018
e2af5a3
revert series
minggli Mar 12, 2018
46b7342
remove shared_doc, pandas. and add infer description
minggli Mar 12, 2018
7f1d3d4
remove pandas. and add infer description
minggli Mar 12, 2018
3e545f3
miscellaneous changes
minggli Mar 13, 2018
c1d6f03
miscellaneous changes
minggli Mar 13, 2018
39969d5
move See Also before Example and add os.remove
minggli Mar 13, 2018
c1a9d57
simplify See Also and to_pickle summary.
minggli Mar 13, 2018
26b3e2e
simplify See Also and to_pickle summary in pandas.io.pickle
minggli Mar 13, 2018
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 7 additions & 1 deletion pandas/core/frame.py
Original file line number Diff line number Diff line change
Expand Up @@ -78,7 +78,7 @@
from pandas.compat import (range, map, zip, lrange, lmap, lzip, StringIO, u,
OrderedDict, raise_with_traceback)
from pandas import compat
from pandas.compat import PY36
from pandas.compat import PY36, cPickle as pkl
from pandas.compat.numpy import function as nv
from pandas.util._decorators import (Appender, Substitution,
rewrite_axis_style_signature)
Expand Down Expand Up @@ -1602,6 +1602,12 @@ def to_excel(self, excel_writer, sheet_name='Sheet1', na_rep='',
startcol=startcol, freeze_panes=freeze_panes,
engine=engine)

@Appender(_shared_docs['to_pickle'] % _shared_doc_kwargs)
def to_pickle(self, path, compression='infer',
protocol=pkl.HIGHEST_PROTOCOL):
return super(DataFrame, self).to_pickle(path, compression=compression,
protocol=protocol)

def to_stata(self, fname, convert_dates=None, write_index=True,
encoding="latin-1", byteorder=None, time_stamp=None,
data_label=None, variable_labels=None):
Expand Down
80 changes: 56 additions & 24 deletions pandas/core/generic.py
Original file line number Diff line number Diff line change
Expand Up @@ -1652,6 +1652,62 @@ def _repr_latex_(self):
strings before writing.
"""

_shared_docs['to_pickle'] = """
Pickle (serialize) %(klass)s object to input file path.

Parameters
----------
path : string
File path where the pickled %(klass)s object will be stored.
compression : {'infer', 'gzip', 'bz2', 'xz', None}, default 'infer'
A string representing the compression to use in the output file.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add that the default 'infer' infers it from the specified path?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added. 💯


.. versionadded:: 0.20.0
protocol : int
Int which indicates which protocol should be used by the pickler,
default HIGHEST_PROTOCOL (see [1], paragraph 12.1.2). The possible
values for this parameter depend on the version of Python. For
Python 2.x, possible values are 0, 1, 2. For Python>=3.0, 3 is a
valid value. For Python >= 3.4, 4 is a valid value.A negative value
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

space after '.'

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

for the protocol parameter is equivalent to setting its value to
HIGHEST_PROTOCOL.

.. [1] https://docs.python.org/3/library/pickle.html
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we are putting these in a Reference section @jorisvandenbossche ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's the numpydoc way yes, but currently we have no docstring using that, so I would leave it like this for now

.. versionadded:: 0.21.0

Examples
--------
>>> original_df = pd.DataFrame({"foo": range(5), "bar": range(5, 10)})
>>> original_df
foo bar
0 0 5
1 1 6
2 2 7
3 3 8
4 4 9
>>> original_df.to_pickle("./dummy.pkl")

>>> unpickled_df = pd.read_pickle("./dummy.pkl")
>>> unpickled_df
foo bar
0 0 5
1 1 6
2 2 7
3 3 8
4 4 9

See Also
--------
pandas.read_pickle : Load pickled pandas object (or any other pickled
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can leave out the 'pandas' in all of those references

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed 👍

object) from the specified file path.
pandas.DataFrame.to_hdf : Write the contained data to an HDF5 file using
HDFStore.
pandas.DataFrame.to_sql : Write records stored in a DataFrame to a SQL
database.
pandas.DataFrame.to_parquet : Write a DataFrame to the binary parquet
format.
"""

def to_json(self, path_or_buf=None, orient=None, date_format=None,
double_precision=10, force_ascii=True, date_unit='ms',
default_handler=None, lines=False, compression=None,
Expand Down Expand Up @@ -1900,30 +1956,6 @@ def to_sql(self, name, con, schema=None, if_exists='fail', index=True,

def to_pickle(self, path, compression='infer',
protocol=pkl.HIGHEST_PROTOCOL):
"""
Pickle (serialize) object to input file path.

Parameters
----------
path : string
File path
compression : {'infer', 'gzip', 'bz2', 'xz', None}, default 'infer'
a string representing the compression to use in the output file

.. versionadded:: 0.20.0
protocol : int
Int which indicates which protocol should be used by the pickler,
default HIGHEST_PROTOCOL (see [1], paragraph 12.1.2). The possible
values for this parameter depend on the version of Python. For
Python 2.x, possible values are 0, 1, 2. For Python>=3.0, 3 is a
valid value. For Python >= 3.4, 4 is a valid value.A negative value
for the protocol parameter is equivalent to setting its value to
HIGHEST_PROTOCOL.

.. [1] https://docs.python.org/3/library/pickle.html
.. versionadded:: 0.21.0

"""
from pandas.io.pickle import to_pickle
return to_pickle(self, path, compression=compression,
protocol=protocol)
Expand Down
9 changes: 8 additions & 1 deletion pandas/core/series.py
Original file line number Diff line number Diff line change
Expand Up @@ -54,7 +54,8 @@
from pandas import compat
from pandas.io.formats.terminal import get_terminal_size
from pandas.compat import (
zip, u, OrderedDict, StringIO, range, get_range_parameters, PY36)
zip, u, OrderedDict, StringIO, range, get_range_parameters, PY36,
cPickle as pkl)
from pandas.compat.numpy import function as nv

import pandas.core.ops as ops
Expand Down Expand Up @@ -2952,6 +2953,12 @@ def to_excel(self, excel_writer, sheet_name='Sheet1', na_rep='',
merge_cells=merge_cells, encoding=encoding,
inf_rep=inf_rep, verbose=verbose)

@Appender(generic._shared_docs['to_pickle'] % _shared_doc_kwargs)
def to_pickle(self, path, compression='infer',
protocol=pkl.HIGHEST_PROTOCOL):
return super(Series, self).to_pickle(path, compression=compression,
protocol=protocol)

@Appender(generic._shared_docs['isna'] % _shared_doc_kwargs)
def isna(self):
return super(Series, self).isna()
Expand Down
81 changes: 73 additions & 8 deletions pandas/io/pickle.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,15 +10,16 @@

def to_pickle(obj, path, compression='infer', protocol=pkl.HIGHEST_PROTOCOL):
"""
Pickle (serialize) object to input file path
Pickle (serialize) object to input file path.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find it a bit strange to say we pickle the object the a "file path", it's actually to a file (located at the file path), so maybe simplify to "Pickle (serialize) object to file." ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

amended 👍


Parameters
----------
obj : any object
Any python object.
path : string
File path
File path where the pickled object will be stored.
compression : {'infer', 'gzip', 'bz2', 'xz', None}, default 'infer'
a string representing the compression to use in the output file
A string representing the compression to use in the output file.

.. versionadded:: 0.20.0
protocol : int
Expand All @@ -33,7 +34,37 @@ def to_pickle(obj, path, compression='infer', protocol=pkl.HIGHEST_PROTOCOL):
.. [1] https://docs.python.org/3/library/pickle.html
.. versionadded:: 0.21.0


Examples
--------
>>> original_df = pd.DataFrame({"foo": range(5), "bar": range(5, 10)})
>>> original_df
foo bar
0 0 5
1 1 6
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a See Also and ref read_pickle (to_hdf, to_sql, to_parquet) also good

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added 💯

2 2 7
3 3 8
4 4 9
>>> pd.to_pickle(original_df, "./dummy.pkl")

>>> unpickled_df = pd.read_pickle("./dummy.pkl")
>>> unpickled_df
foo bar
0 0 5
1 1 6
2 2 7
3 3 8
4 4 9

See Also
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See Also should go before the Examples.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

corrected.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you push your latest changes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just because you mentioned you corrected it, but the see also is still after the examples?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh sorry, just missed this one and did others.

--------
pandas.read_pickle : Load pickled pandas object (or any other pickled
object) from the specified file path.
pandas.DataFrame.to_hdf : Write the contained data to an HDF5 file using
HDFStore.
pandas.DataFrame.to_sql : Write records stored in a DataFrame to a SQL
database.
pandas.DataFrame.to_parquet : Write a DataFrame to the binary parquet
format.
"""
path = _stringify_path(path)
inferred_compression = _infer_compression(path, compression)
Expand All @@ -52,15 +83,17 @@ def to_pickle(obj, path, compression='infer', protocol=pkl.HIGHEST_PROTOCOL):
def read_pickle(path, compression='infer'):
"""
Load pickled pandas object (or any other pickled object) from the specified
file path
file path.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you try to fit this on one line?

Maybe the "or any other pickled object" is not needed in the summary line and can go in an extended summary, as the typical use case should be pandas objects.
Or, maybe the "from specified file path" can be shortened to "from file."

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shortened.


.. warning::

Warning: Loading pickled data received from untrusted sources can be
unsafe. See: https://docs.python.org/3/library/pickle.html
Loading pickled data received from untrusted sources can be
unsafe. See `here <https://docs.python.org/3/library/pickle.html>`__.

Parameters
----------
path : string
File path
File path where the pickled object will be loaded.
compression : {'infer', 'gzip', 'bz2', 'xz', 'zip', None}, default 'infer'
For on-the-fly decompression of on-disk data. If 'infer', then use
gzip, bz2, xz or zip if path ends in '.gz', '.bz2', '.xz',
Expand All @@ -72,6 +105,38 @@ def read_pickle(path, compression='infer'):
Returns
-------
unpickled : type of object stored in file
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a See Also and ref DataFrame.to_pickle, pd.read_hdf, pd.read_sql, pd.read_parquet

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added.


Examples
--------
>>> original_df = pd.DataFrame({"foo": range(5), "bar": range(5, 10)})
>>> original_df
foo bar
0 0 5
1 1 6
2 2 7
3 3 8
4 4 9
>>> pd.to_pickle(original_df, "./dummy.pkl")

>>> unpickled_df = pd.read_pickle("./dummy.pkl")
>>> unpickled_df
foo bar
0 0 5
1 1 6
2 2 7
3 3 8
4 4 9

See Also
--------
pandas.DataFrame.to_pickle : Pickle (serialize) DataFrame object to input
file path.
pandas.Series.to_pickle : Pickle (serialize) Series object to input
file path.
pandas.read_hdf : read from the store, close it if we opened it.
pandas.read_sql : Read SQL query or database table into a DataFrame.
pandas.read_parquet : Load a parquet object from the file path, returning
a DataFrame.
"""
path = _stringify_path(path)
inferred_compression = _infer_compression(path, compression)
Expand Down