-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
DOC: update the to_pickle
& read_pickle
docstring
#20253
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
to_pickle
& 'read_pickle' docstringto_pickle
& read_pickle
docstring
pandas/io/pickle.py
Outdated
|
||
Examples | ||
-------- | ||
>>> from pandas import DataFrame, read_pickle |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you don't need these imports, instead use pd.DataFrame
and pd.read_pickle
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed 👍
pandas/io/pickle.py
Outdated
>>> original_df | ||
foo bar | ||
0 0 5 | ||
1 1 6 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add a See Also and ref read_pickle (to_hdf, to_sql, to_parquet) also good
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added 💯
@@ -72,6 +93,28 @@ def read_pickle(path, compression='infer'): | |||
Returns | |||
------- | |||
unpickled : type of object stored in file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add a See Also and ref DataFrame.to_pickle, pd.read_hdf, pd.read_sql, pd.read_parquet
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added.
pandas/core/generic.py
Outdated
for the protocol parameter is equivalent to setting its value to | ||
HIGHEST_PROTOCOL. | ||
|
||
.. [1] https://docs.python.org/3/library/pickle.html |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we are putting these in a Reference section @jorisvandenbossche ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's the numpydoc way yes, but currently we have no docstring using that, so I would leave it like this for now
pandas/io/pickle.py
Outdated
-------- | ||
pandas.DataFrame.to_pickle | ||
pandas.Series.to_pickle | ||
pandas.read_hdf, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't put commas at the end
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we add text describing each of these
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
corrected.
pandas/io/pickle.py
Outdated
See Also | ||
-------- | ||
pandas.read_pickle | ||
pandas.to_hdf |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added.
pandas/core/generic.py
Outdated
|
||
See Also | ||
-------- | ||
pandas.read_pickle |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add similar to what you adding in other See Also below
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR!
I am not sure that I find it worth to use a shared_doc system here. It's used only for substituting Series/DataFrame in the summary line and path parameter explanation. I don't think that's worth the added complexity. I would rather make the docstring generic for both. It's not that the Series version has different options or behaviours.
pandas/core/generic.py
Outdated
default HIGHEST_PROTOCOL (see [1], paragraph 12.1.2). The possible | ||
values for this parameter depend on the version of Python. For | ||
Python 2.x, possible values are 0, 1, 2. For Python>=3.0, 3 is a | ||
valid value. For Python >= 3.4, 4 is a valid value.A negative value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
space after '.'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
pandas/core/generic.py
Outdated
|
||
See Also | ||
-------- | ||
pandas.read_pickle : Load pickled pandas object (or any other pickled |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can leave out the 'pandas' in all of those references
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed 👍
pandas/core/generic.py
Outdated
for the protocol parameter is equivalent to setting its value to | ||
HIGHEST_PROTOCOL. | ||
|
||
.. [1] https://docs.python.org/3/library/pickle.html |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's the numpydoc way yes, but currently we have no docstring using that, so I would leave it like this for now
pandas/core/generic.py
Outdated
path : string | ||
File path where the pickled %(klass)s object will be stored. | ||
compression : {'infer', 'gzip', 'bz2', 'xz', None}, default 'infer' | ||
A string representing the compression to use in the output file. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add that the default 'infer' infers it from the specified path?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added. 💯
Codecov Report
@@ Coverage Diff @@
## master #20253 +/- ##
=========================================
Coverage ? 91.76%
=========================================
Files ? 150
Lines ? 49148
Branches ? 0
=========================================
Hits ? 45099
Misses ? 4049
Partials ? 0
Continue to review full report at Codecov.
|
@jorisvandenbossche thanks for the comments. applied changes including reverting to generic docstring. agree that for the benefits, not necessary to use _shared_docs. |
4 4 9 | ||
>>> original_df.to_pickle("./dummy.pkl") | ||
|
||
>>> unpickled_df = pd.read_pickle("./dummy.pkl") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jorisvandenbossche how handling file paths in doc-strings?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
great, see that now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks guys for looking into it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the updates! Looking good
Added some minor additional comments. One more general comment: it might be nice to add an extended summary very briefly explaining what "pickle" is and linking to the python docs about it.
2 2 7 | ||
3 3 8 | ||
4 4 9 | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add here a
>>> import os
>>> os.remove("./dummy.pkl")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added. :)
pandas/core/generic.py
Outdated
See Also | ||
-------- | ||
read_pickle : Load pickled pandas object (or any other pickled object) | ||
from the specified file path. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you only indent this line with 4 spaces? Like
read_pickle : Load pickled pandas object (or any other pickled object)
from the specified file path.
Both work for sphinx, but we mainly use this pattern, so it's better to be consistent in this.
(same for the other ones below)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
pandas/io/pickle.py
Outdated
@@ -52,15 +81,17 @@ def to_pickle(obj, path, compression='infer', protocol=pkl.HIGHEST_PROTOCOL): | |||
def read_pickle(path, compression='infer'): | |||
""" | |||
Load pickled pandas object (or any other pickled object) from the specified | |||
file path | |||
file path. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you try to fit this on one line?
Maybe the "or any other pickled object" is not needed in the summary line and can go in an extended summary, as the typical use case should be pandas objects.
Or, maybe the "from specified file path" can be shortened to "from file."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shortened.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, added some comments.
pandas/core/generic.py
Outdated
@@ -1906,23 +1906,54 @@ def to_pickle(self, path, compression='infer', | |||
Parameters | |||
---------- | |||
path : string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
str
instead of string
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done 👍
pandas/core/generic.py
Outdated
compression : {'infer', 'gzip', 'bz2', 'xz', None}, default 'infer' | ||
a string representing the compression to use in the output file | ||
A string representing the compression to use in the output file. By | ||
default, infers from the specified path. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"infers from the specified file extension" may be?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes pythonista.
HIGHEST_PROTOCOL. | ||
valid value. For Python >= 3.4, 4 is a valid value. A negative | ||
value for the protocol parameter is equivalent to setting its value | ||
to HIGHEST_PROTOCOL. | ||
|
||
.. [1] https://docs.python.org/3/library/pickle.html |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think is more standard to have this in a References
section.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jorisvandenbossche your opinion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I previously said it was fine, because we don't use References
sections in many cases (most of the time we use inline links), another thing we can discuss in further improving the guidelines.
pandas/io/pickle.py
Outdated
3 3 8 | ||
4 4 9 | ||
|
||
See Also |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See Also
should go before the Examples
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
corrected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you push your latest changes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just because you mentioned you corrected it, but the see also is still after the examples?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh sorry, just missed this one and did others.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added some remaining comment about the "See Also" descriptions.
Looking good for the rest!
pandas/io/pickle.py
Outdated
See Also | ||
-------- | ||
read_pickle : Load pickled pandas object (or any object) from file. | ||
DataFrame.to_hdf : Write the contained data to an HDF5 file using HDFStore. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would make this a bit simpler (eg no need to mention HDFStore): "Write DataFame to HDF5 file"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
amended 👍
pandas/io/pickle.py
Outdated
-------- | ||
read_pickle : Load pickled pandas object (or any object) from file. | ||
DataFrame.to_hdf : Write the contained data to an HDF5 file using HDFStore. | ||
DataFrame.to_sql : Write records stored in a DataFrame to a SQL database. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be a bit more consistent in the wording here, maybe just "Write DataFrame to a SQL database" ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
amended 👍
pandas/io/pickle.py
Outdated
DataFrame.to_pickle : Pickle (serialize) DataFrame object to input file | ||
path. | ||
Series.to_pickle : Pickle (serialize) Series object to input file path. | ||
read_hdf : read from the store, close it if we opened it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the explanation about closing the file is too detailed for the See Also explanation. Maybe something like "Read HDF5 file into a DataFrame" ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
amended 👍
pandas/io/pickle.py
Outdated
Series.to_pickle : Pickle (serialize) Series object to input file path. | ||
read_hdf : read from the store, close it if we opened it. | ||
read_sql : Read SQL query or database table into a DataFrame. | ||
read_parquet : Load a parquet object from the file path, returning a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar comment here, "from the file path" is rather specific, would leave out that part
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
amended 👍
pandas/io/pickle.py
Outdated
@@ -10,15 +10,17 @@ | |||
|
|||
def to_pickle(obj, path, compression='infer', protocol=pkl.HIGHEST_PROTOCOL): | |||
""" | |||
Pickle (serialize) object to input file path | |||
Pickle (serialize) object to input file path. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find it a bit strange to say we pickle the object the a "file path", it's actually to a file (located at the file path), so maybe simplify to "Pickle (serialize) object to file." ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
amended 👍
this lgtm. do we use Reference in other doc-strings? |
@minggli Thanks a lot for the PR! |
Checklist for the pandas documentation sprint (ignore this if you are doing
an unrelated PR):
scripts/validate_docstrings.py <your-function-or-method>
git diff upstream/master -u -- "*.py" | flake8 --diff
python doc/make.py --single <your-function-or-method>
Please include the output of the validation script below between the "```" ticks:
description already finishes with "."
.. versionadded:: tag should not end with "."