File tree 1 file changed +9
-3
lines changed
1 file changed +9
-3
lines changed Original file line number Diff line number Diff line change @@ -2877,9 +2877,15 @@ def to_parquet(
2877
2877
2878
2878
Notes
2879
2879
-----
2880
- This function requires either the `fastparquet
2881
- <https://pypi.org/project/fastparquet>`_ or `pyarrow
2882
- <https://arrow.apache.org/docs/python/>`_ library.
2880
+ * This function requires either the `fastparquet
2881
+ <https://pypi.org/project/fastparquet>`_ or `pyarrow
2882
+ <https://arrow.apache.org/docs/python/>`_ library.
2883
+ * When saving a DataFrame with categorical columns to parquet,
2884
+ the file size may increase due to the inclusion of all possible
2885
+ categories, not just those present in the data. This behavior
2886
+ is expected and consistent with pandas' handling of categorical data.
2887
+ * To manage file size and ensure a more predictable roundtrip process,
2888
+ consider using `remove_unused_categories` on the DataFrame before saving.
2883
2889
2884
2890
Examples
2885
2891
--------
You can’t perform that action at this time.
0 commit comments