Skip to content

DOC: to_pickle arguments for compression have a mistake (easy-fix) #35364

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
ianozsvald opened this issue Jul 21, 2020 · 2 comments · Fixed by #38192
Closed

DOC: to_pickle arguments for compression have a mistake (easy-fix) #35364

ianozsvald opened this issue Jul 21, 2020 · 2 comments · Fixed by #38192
Labels

Comments

@ianozsvald
Copy link
Contributor

Location of the documentation

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_pickle.html

Documentation problem

compression{‘infer’, ‘gzip’, ‘bz2’, ‘zip’, ‘xz’, None}, default ‘infer’ notes that gzip is a valid extension, this is wrong, the extension checked for is gz. The documentation for to_csv is correct with ... detect compression mode from the following extensions: ‘.gz’, ‘.bz2’, ‘.zip’ or ‘.xz’ and should act as a guide: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_csv.html

Suggested fix for documentation

compression{‘infer’, ‘gz’, ‘bz2’, ‘zip’, ‘xz’, None}, default ‘infer’ to match the code implementation: https://github.com/pandas-dev/pandas/blob/v1.0.5/pandas/io/common.py#L223

There may be an argument to extending the _compression_to_extension code to accept both gzip and gzfor gzip compression, whoever reads this might decide this is worth escalating. For me making sure we fix the docs in line with the code is the priority.

@ianozsvald ianozsvald added Docs Needs Triage Issue that has not been reviewed by a pandas team member labels Jul 21, 2020
@rhshadrach rhshadrach removed the Needs Triage Issue that has not been reviewed by a pandas team member label Jul 26, 2020
@rhshadrach rhshadrach added this to the Contributions Welcome milestone Jul 26, 2020
@rhshadrach
Copy link
Member

It's a bit confusing because gzip is the one compression type where the mode and the extension differ. The option compression mode is gzip. On the other hand, when mode=infer, the extension gz is looked for.

This is how to_csv works too:

Compression mode may be any of the following possible values: {‘infer’, ‘gzip’, ‘bz2’, ‘zip’, ‘xz’, None}. If compression mode is ‘infer’ and path_or_buf is path-like, then detect compression mode from the following extensions: ‘.gz’, ‘.bz2’, ‘.zip’ or ‘.xz’. (otherwise no compression). If dict given and mode is ‘zip’ or inferred as ‘zip’, other entries passed as additional compression options.

It seems to me having these details would be useful in the to_pickle docs as well.

@ianozsvald
Copy link
Contributor Author

Thanks to @jreback for picking this up 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants