-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
ENH: Series.str.get_dummies should defer to pd.get_dummies and pass thru args #19618
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
what actually should happen is that which already does all of this:
|
want to do a PR, this is pretty straightforward. |
|
thanks @jreback I think the issue is that the Example
Now,
while
Thanks |
@randomgambit my point is that this can simply dispatch to the impl of get_dummies. |
@randomgambit did you find a solution to that issue as I have the same with my dataset? |
FYI. Some people on Twitter are not happy that |
@billtubbs that feels like a separate issue, could you open a new one specifically about the |
Hello the Pandas team and thanks for making this package greater day after day.
I was using the
str.get_dummies
method on a dataframe and I realized that by default the dummies are coded asint64
.This looks to me very inefficient because I ran into a memory error when trying to get dummies for a dataframe with several millions of rows (and about 5k dummies). I had to create the dummies by chunk, and use
to_numeric()
to coerce toint8
.Would it be possible to natively have the dummies in
int8
format so that they take very little space? In that caseNaN
would be coerced to 0 but that should be fine.What do you think?
Thanks!
The text was updated successfully, but these errors were encountered: