ENH: Series.str.get_dummies should defer to pd.get_dummies and pass thru args

Hello the Pandas team and thanks for making this package greater day after day.

I was using the `str.get_dummies` method on a dataframe and I realized that by default the dummies are coded as `int64`.  

This looks to me very inefficient because I ran into a memory error when trying to get dummies for a dataframe with several millions of rows (and about 5k dummies). I had to create the dummies by chunk, and use `to_numeric()` to coerce to `int8`.
 
Would it be possible to natively have the dummies in `int8` format so that they take very little space? In that case `NaN` would be coerced to 0 but that should be fine.

What do you think?
Thanks! 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

ENH: Series.str.get_dummies should defer to pd.get_dummies and pass thru args #19618

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

ENH: Series.str.get_dummies should defer to pd.get_dummies and pass thru args #19618

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions