You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[SPARK-43568][SPARK-43633][PS] Support Categorical APIs for pandas 2
### What changes were proposed in this pull request?
This PR proposes to support `Categorical` APIs for [pandas 2](https://pandas.pydata.org/docs/dev/whatsnew/v2.0.0.html), and match the behavior.
### Why are the changes needed?
To support pandas API on Spark with pandas 2.0.0 and above.
### Does this PR introduce _any_ user-facing change?
The behavior is matched with pandas 2.0.0 and above. e.g.
```diff
>>> psser
0 1
1 2
2 3
3 1
4 2
5 3
Name: a, dtype: category
Categories (3, int64): [1, 2, 3]
>>> psser.cat.remove_categories([1, 2, 3])
0 NaN
1 NaN
2 NaN
3 NaN
4 NaN
5 NaN
Name: a, dtype: category
- Categories (0, object): []
+ Categories (0, int64): []
```
### How was this patch tested?
Enabling the existing tests.
Closesapache#42273 from itholic/pandas_categorical.
Authored-by: itholic <[email protected]>
Signed-off-by: Hyukjin Kwon <[email protected]>
Copy file name to clipboardExpand all lines: python/docs/source/migration_guide/pyspark_upgrade.rst
+1
Original file line number
Diff line number
Diff line change
@@ -30,6 +30,7 @@ Upgrading from PySpark 3.5 to 4.0
30
30
* In Spark 4.0, ``DataFrame.mad`` has been removed from pandas API on Spark.
31
31
* In Spark 4.0, ``Series.mad`` has been removed from pandas API on Spark.
32
32
* In Spark 4.0, ``na_sentinel`` parameter from ``Index.factorize`` and `Series.factorize`` has been removed from pandas API on Spark, use ``use_na_sentinel`` instead.
33
+
* In Spark 4.0, ``inplace`` parameter from ``Categorical.add_categories``, ``Categorical.remove_categories``, ``Categorical.set_categories``, ``Categorical.rename_categories``, ``Categorical.reorder_categories``, ``Categorical.as_ordered``, ``Categorical.as_unordered`` have been removed from pandas API on Spark.
0 commit comments