-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: get_dummies not returning SparseDataFrame #10535
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -8,6 +8,7 @@ | |
import nose | ||
|
||
from pandas import DataFrame, Series | ||
from pandas.core.sparse import SparseDataFrame | ||
import pandas as pd | ||
|
||
from numpy import nan | ||
|
@@ -171,6 +172,33 @@ def test_basic(self): | |
expected.index = list('ABC') | ||
assert_frame_equal(get_dummies(s_series_index, sparse=self.sparse), expected) | ||
|
||
def test_basic_types(self): | ||
# GH 10531 | ||
s_list = list('abc') | ||
s_series = Series(s_list) | ||
s_df = DataFrame({'a': [0, 1, 0, 1, 2], | ||
'b': ['A', 'A', 'B', 'C', 'C'], | ||
'c': [2, 3, 3, 3, 2]}) | ||
|
||
if not self.sparse: | ||
exp_df_type = DataFrame | ||
exp_blk_type = pd.core.internals.FloatBlock | ||
else: | ||
exp_df_type = SparseDataFrame | ||
exp_blk_type = pd.core.internals.SparseBlock | ||
|
||
self.assertEqual(type(get_dummies(s_list, sparse=self.sparse)), exp_df_type) | ||
self.assertEqual(type(get_dummies(s_series, sparse=self.sparse)), exp_df_type) | ||
|
||
r = get_dummies(s_df, sparse=self.sparse, columns=s_df.columns) | ||
self.assertEqual(type(r), exp_df_type) | ||
|
||
r = get_dummies(s_df, sparse=self.sparse, columns=['a']) | ||
self.assertEqual(type(r[['a_0']]._data.blocks[0]), exp_blk_type) | ||
self.assertEqual(type(r[['a_1']]._data.blocks[0]), exp_blk_type) | ||
self.assertEqual(type(r[['a_2']]._data.blocks[0]), exp_blk_type) | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @jreback How about now? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ok looks reasonable ping when green FYI - don't take lines out of the what's new I put them there on purpose - helps avoid merge conflicts (in fact u have one now because of that) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @jreback OK , green now. |
||
|
||
def test_just_na(self): | ||
just_na_list = [np.nan] | ||
just_na_series = Series(just_na_list) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is confusing, I think we should just always return a
SparseDataFrame
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about the case when some of the blocks are dense ? e.g.
Here
a
is still dense. Is a df where some blocks are dense and some sparse more accurately called aDataFrame
or aSparseDataFrame
?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jreback ?