-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
ENH: add from_dummies #31795
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: add from_dummies #31795
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool thanks! Initial comments on API decisions
pandas/core/reshape/reshape.py
Outdated
Separator between original column name and dummy variable | ||
dtype : dtype, default 'category' | ||
Data dtype for new columns - only a single data type is allowed | ||
fill_first : str, list, or dict, default None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I'm -1 on this feature; not sure I see it being super useful outside of niche applications
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@WillAyd thanks for your feedback - when you say "this feature", do you mean from_dummies
, or just fill_first
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fill_first
pandas/core/reshape/reshape.py
Outdated
Parameters | ||
---------- | ||
data : DataFrame | ||
columns : list-like, default None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would also rather not provide columns
as a keyword; duplicative of the first argument
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or is this supposed to be the inverse of the prefix
argument in get_dummies
? If so, I think should just reuse that name
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I'll change it to prefix
efbe53f
to
303ac44
Compare
Hello @MarcoGorelli! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found: There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻 Comment last updated at 2020-04-18 10:00:29 UTC |
45b5508
to
4be43af
Compare
@WillAyd have made some updates. See screenshots in original post for examples. Have removed |
@WillAyd thanks for your review, have updated |
@MarcoGorelli did you see this impl: #8745 (comment). I also don't see why we would ever return anyting but a Series / Categorical here. can you elaborate? |
I hadn't, thanks for pointing me towards it! @clbarnes that looks good, are you interested in taking over this issue with your implementation? |
I can do if you'd prefer that strategy - will need to have a think about the requirements, e.g. what kind of incorrect inputs should it try to gracefully handle, handling masked arrays and NA and so on. That's probably discussion for the issue though. I personally err on the side of being less permissive than pandas is generally so it may end up being pretty constrained, if that's OK with the maintainers. |
Sure, go ahead! I'll close this PR then, though please let me know if for whatever reason you chose not to work on this |
Just a first draft
Will close #8745
Screenshot of examples