ENH: Consistent API between pd.get_dummies()
and Series.str.get_dummies()
#59235
Labels
Enhancement
Needs Discussion
Requires discussion from core team before further action
Strings
String extension data type and string data
Feature Type
Adding new functionality to pandas
Changing existing functionality in pandas
Removing existing functionality in pandas
Problem Description
Compared to
pd.get_dummies()
,Series.str.get_dummies()
behaves so differently and has much more limited functionality. Such differences would not be user-friendly.Feature Description
The dtype of the return DataFrame of
Series.str.get_dummies()
should bebool
, notint64
.before:
after (same as
pd.get_dummies(s)
):prefix=
,prefix_sep=
,dummy_na=
,sparse=
, anddtype=
arguments should be added toSeries.str.get_dummies()
.after (same as
pd.get_dummies(s, prefix="dummy", prefix_sep="=", dummy_na=True, dtype=float)
):Note: Among the arguments of
pd.get_dummies()
, thecolumns=
argument is obviously not needed forSeries.str.get_dummies()
. WhetherSeries.str.get_dummies()
needs adrop_first=
argument is debatable sinceSeries.str.get_dummies()
can yieldTrue
in multiple columns unlikepd.get_dummies()
.Alternative Solutions
While there are countless alternatives to obtaining DataFrames that yield the same result, there is no alternative that would bring consistency to the two methods. The only alternative might be to simply deprecate
Series.str.get_dummies()
.Additional Context
No response
The text was updated successfully, but these errors were encountered: