Skip to content

ENH: Support MultiIndex columns in parquet (#34777) #36305

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 20 commits into from
Nov 19, 2020
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
Show all changes
20 commits
Select commit Hold shift + click to select a range
681ac1f
ENH: Support MultiIndex columns in parquet (#34777)
hweecat Sep 12, 2020
a46e46f
ENH: Support MultiIndex columns in parquet GH34777
hweecat Sep 12, 2020
c974259
Merge remote-tracking branch 'upstream/master' into io-parquet-multii…
hweecat Sep 12, 2020
e9ff779
ENH: Support MultiIndex columns in parquet #34777
hweecat Sep 13, 2020
1b9e3f0
ENH: Support MultiIndex columns in parquet #34777
hweecat Sep 13, 2020
9e8f4eb
ENH: Support MultiIndex columns in parquet #34777
hweecat Sep 14, 2020
cc0f504
ENH: Support MultiIndex columns in parquet #34777
hweecat Sep 14, 2020
3ba38fa
Merge remote-tracking branch 'upstream/master' into io-parquet-multii…
hweecat Sep 14, 2020
a4131d2
Merge remote-tracking branch 'upstream/master' into io-parquet-multii…
hweecat Sep 25, 2020
26966b7
Merge remote-tracking branch 'upstream/master' into io-parquet-multii…
hweecat Oct 1, 2020
cc8e85c
Merge remote-tracking branch 'upstream/master' into io-parquet-multii…
hweecat Oct 10, 2020
3b9b52a
Merge remote-tracking branch 'upstream/master' into io-parquet-multii…
hweecat Oct 11, 2020
ed5fe60
ENH: Support MultiIndex columns in parquet #34777
hweecat Oct 11, 2020
c859a4f
fix doc failure
hweecat Oct 11, 2020
039094c
Merge remote-tracking branch 'upstream/master' into io-parquet-multii…
hweecat Nov 11, 2020
167ae69
Merge remote-tracking branch 'upstream/master' into io-parquet-multii…
hweecat Nov 14, 2020
2e4fc58
Merge remote-tracking branch 'upstream/master' into io-parquet-multii…
hweecat Nov 15, 2020
180ddff
Update doc/source/whatsnew/v1.2.0.rst
charlesdong1991 Nov 18, 2020
234009b
Merge remote-tracking branch 'upstream/master' into io-parquet-multii…
hweecat Nov 18, 2020
ab24628
Merge remote-tracking branch 'upstream/master' into io-parquet-multii…
hweecat Nov 19, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions doc/source/whatsnew/v1.2.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -297,6 +297,7 @@ I/O
- :meth:`to_csv` did not support zip compression for binary file object not having a filename (:issue: `35058`)
- :meth:`to_csv` and :meth:`read_csv` did not honor `compression` and `encoding` for path-like objects that are internally converted to file-like objects (:issue:`35677`, :issue:`26124`, and :issue:`32392`)
- :meth:`to_picke` and :meth:`read_pickle` did not support compression for file-objects (:issue:`26237`, :issue:`29054`, and :issue:`29570`)
- :meth:`to_parquet` did not support MultiIndex for columns in parquet format (:issue:`34777`)

Plotting
^^^^^^^^
Expand Down
12 changes: 9 additions & 3 deletions pandas/io/parquet.py
Original file line number Diff line number Diff line change
Expand Up @@ -53,9 +53,15 @@ def validate_dataframe(df: DataFrame):
if not isinstance(df, DataFrame):
raise ValueError("to_parquet only supports IO with DataFrames")

# must have value column names (strings only)
if df.columns.inferred_type not in {"string", "empty"}:
raise ValueError("parquet must have string column names")
# must have value column names for all index levels (strings only)
if df.columns.nlevels > 1:
if not all(
x.inferred_type in {"string", "empty"} for x in df.columns.levels
):
raise ValueError("parquet must have string column names")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can say something about 'for all values in each level of the MultiIndex'

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @jreback for the suggestion on the exception statement - adding that into my next commit!

else:
if df.columns.inferred_type not in {"string", "empty"}:
raise ValueError("parquet must have string column names")

# index level names must be strings
valid_names = all(
Expand Down