-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
API: Always return DataFrame from get_dummies #24284
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Hello @TomAugspurger! Thanks for submitting the PR.
|
Codecov Report
@@ Coverage Diff @@
## master #24284 +/- ##
==========================================
- Coverage 92.22% 92.22% -0.01%
==========================================
Files 162 162
Lines 51828 51821 -7
==========================================
- Hits 47799 47791 -8
- Misses 4029 4030 +1
Continue to review full report at Codecov.
|
Codecov Report
@@ Coverage Diff @@
## master #24284 +/- ##
==========================================
- Coverage 92.28% 92.28% -0.01%
==========================================
Files 162 162
Lines 51830 51827 -3
==========================================
- Hits 47831 47827 -4
- Misses 3999 4000 +1
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor comments
doc/source/whatsnew/v0.24.0.rst
Outdated
is not dummy encoded. When just ``["B", "C"]`` are passed to ``get_dummies``, | ||
then all the columns are dummy-encoded, and a :class:`SparseDataFrame` was returned. | ||
|
||
.. ipython:: python |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
code-block
pandas/core/reshape/reshape.py
Outdated
|
||
# if all NaN | ||
if not dummy_na and len(levels) == 0: | ||
return get_empty_Frame(data, sparse) | ||
return get_empty_Frame(data) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
did you mean to capitalize? (or was that before)
does that specific issues that it closes? |
|
||
.. note:: | ||
|
||
There's no difference in memory usage between a :class:`SparseDataFrame` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you might need to update the existing docs slightly and/or change the usage in previous whatsnew notes.
pandas/core/reshape/reshape.py
Outdated
@@ -865,19 +863,16 @@ def _get_dummies_1d(data, prefix, prefix_sep='_', dummy_na=False, | |||
if is_object_dtype(dtype): | |||
raise ValueError("dtype=object is not a valid dtype for get_dummies") | |||
|
|||
def get_empty_Frame(data, sparse): | |||
def get_empty_Frame(data): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lowercase this, maybe make a module level function.
No specific issue. |
thanks ! |
xref #19239
In preparation for hopefully deprecating SparseDataFrame / SparseSeries. Right now, I've just made this an API breaking change.
If we want to do this via deprecation, we'd need a new keyword argument to control the result type. I opted against that, because to get a warning-free
get_dummies
people would needIMO, that's too big a burden for the common case, but would be curious to hear others' thoughts here.