Skip to content

DOC: Update index parameter in pandas to_parquet #28217

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Sep 16, 2019
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions pandas/core/frame.py
Original file line number Diff line number Diff line change
Expand Up @@ -2148,10 +2148,10 @@ def to_parquet(
'pyarrow' is unavailable.
compression : {'snappy', 'gzip', 'brotli', None}, default 'snappy'
Name of the compression to use. Use ``None`` for no compression.
index : bool, default None
index : bool, default True
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default is still None, so need to change this back I think

If ``True``, include the dataframe's index(es) in the file output.
If ``False``, they will not be written to the file. If ``None``,
the behavior depends on the chosen engine.
If ``False``, they will not be written to the file.
If ``None``, RangeIndex will be stored as metadata-only.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe mention that other indexes are included as columns in the output in this case?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @galuhsahid for the update.

Can you change the default back to None in the docstrings too?

And for the None description, looks good, but I'm wondering if it'd be clearer to explain that None is like True and the index is stored, but that a RangeIndex (the default in pandas) is not stored as the values like True, but as a range in the metadata, so it doesn't require so much space, and it's faster.

The same you're saying, but a bit more extended, so people without much knowledge about parquet or pandas indices can still understand whether it's a good choice for them.

Btw, minor thing. Every time you push, people who participated in the conversation of the PR receives a notification. I think it's a good practice to commit as many times as you think it's useful, but when it doesn't make a difference, it may be good to just push when your local changes are ready for another review. Not a big deal if pushing more often is useful to you, and you surely want to push too if you stop working on this for the day and want to back up your changes in your remote branch... But wanted to let you know, in case it's useful

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I think that's a good idea - I've expanded the explanation, would appreciate any input :) also thanks for the tip, I'll be more mindful when pushing my commits!


.. versionadded:: 0.24.0

Expand Down
6 changes: 3 additions & 3 deletions pandas/io/parquet.py
Original file line number Diff line number Diff line change
Expand Up @@ -228,10 +228,10 @@ def to_parquet(
'pyarrow' is unavailable.
compression : {'snappy', 'gzip', 'brotli', None}, default 'snappy'
Name of the compression to use. Use ``None`` for no compression.
index : bool, default None
index : bool, default True
If ``True``, include the dataframe's index(es) in the file output. If
``False``, they will not be written to the file. If ``None``, the
engine's default behavior will be used.
``False``, they will not be written to the file.
If ``None``, RangeIndex will be stored as metadata-only.

.. versionadded:: 0.24.0

Expand Down