File tree 1 file changed +10
-3
lines changed
1 file changed +10
-3
lines changed Original file line number Diff line number Diff line change @@ -2877,9 +2877,16 @@ def to_parquet(
2877
2877
2878
2878
Notes
2879
2879
-----
2880
- This function requires either the `fastparquet
2881
- <https://pypi.org/project/fastparquet>`_ or `pyarrow
2882
- <https://arrow.apache.org/docs/python/>`_ library.
2880
+ * This function requires either the `fastparquet
2881
+ <https://pypi.org/project/fastparquet>`_ or `pyarrow
2882
+ <https://arrow.apache.org/docs/python/>`_ library.
2883
+ * When saving a DataFrame with categorical columns to parquet,
2884
+ the file size may increase due to the inclusion of all possible
2885
+ categories, not just those present in the data. This behavior
2886
+ is expected and consistent with pandas' handling of categorical data.
2887
+ To manage file size and ensure a more predictable roundtrip process,
2888
+ consider using :meth:`Categorical.remove_unused_categories` on the
2889
+ DataFrame before saving.
2883
2890
2884
2891
Examples
2885
2892
--------
You can’t perform that action at this time.
0 commit comments