Skip to content

ENH: add from_dummies #31795

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 19 commits into from
Closed

Conversation

MarcoGorelli
Copy link
Member

@MarcoGorelli MarcoGorelli commented Feb 7, 2020

Just a first draft

Will close #8745

Screenshot of examples

image

@MarcoGorelli MarcoGorelli changed the title ENH: add from_dummies (wip)ENH: add from_dummies Feb 7, 2020
Copy link
Member

@WillAyd WillAyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool thanks! Initial comments on API decisions

Separator between original column name and dummy variable
dtype : dtype, default 'category'
Data dtype for new columns - only a single data type is allowed
fill_first : str, list, or dict, default None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'm -1 on this feature; not sure I see it being super useful outside of niche applications

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@WillAyd thanks for your feedback - when you say "this feature", do you mean from_dummies, or just fill_first?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fill_first

Parameters
----------
data : DataFrame
columns : list-like, default None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would also rather not provide columns as a keyword; duplicative of the first argument

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or is this supposed to be the inverse of the prefix argument in get_dummies? If so, I think should just reuse that name

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I'll change it to prefix

@WillAyd WillAyd added API Design DataFrame DataFrame data structure labels Feb 8, 2020
@pep8speaks
Copy link

pep8speaks commented Feb 9, 2020

Hello @MarcoGorelli! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2020-04-18 10:00:29 UTC

@MarcoGorelli
Copy link
Member Author

MarcoGorelli commented Feb 14, 2020

@WillAyd have made some updates. See screenshots in original post for examples.

Have removed fill_first. Haven't added equivalents of dummy_na or sparse - would you prefer to get a bare-bones implementation out first and add those in future PRs, or should I add them now?

@MarcoGorelli MarcoGorelli changed the title (wip)ENH: add from_dummies ENH: add from_dummies Feb 15, 2020
@MarcoGorelli
Copy link
Member Author

@WillAyd thanks for your review, have updated

@jreback
Copy link
Contributor

jreback commented May 25, 2020

@MarcoGorelli did you see this impl: #8745 (comment).

I also don't see why we would ever return anyting but a Series / Categorical here. can you elaborate?

@MarcoGorelli
Copy link
Member Author

@MarcoGorelli did you see this impl: #8745 (comment).

I also don't see why we would ever return anyting but a Series / Categorical here. can you elaborate?

I hadn't, thanks for pointing me towards it! @clbarnes that looks good, are you interested in taking over this issue with your implementation?

@clbarnes
Copy link
Contributor

I can do if you'd prefer that strategy - will need to have a think about the requirements, e.g. what kind of incorrect inputs should it try to gracefully handle, handling masked arrays and NA and so on. That's probably discussion for the issue though.

I personally err on the side of being less permissive than pandas is generally so it may end up being pretty constrained, if that's OK with the maintainers.

@MarcoGorelli
Copy link
Member Author

I can do if you'd prefer that strategy

Sure, go ahead! I'll close this PR then, though please let me know if for whatever reason you chose not to work on this

@clbarnes clbarnes mentioned this pull request May 28, 2020
5 tasks
@MarcoGorelli MarcoGorelli deleted the from-dummies branch October 10, 2020 14:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

API/ENH: from_dummies
6 participants