Skip to content

BUG: get_dummies not returning SparseDataFrame #10535

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

artemyk
Copy link
Contributor

@artemyk artemyk commented Jul 9, 2015

Fixes #10531 .

@artemyk
Copy link
Contributor Author

artemyk commented Jul 9, 2015

@jreback Perhaps this would be more elegantly fixed by having concat drop empty passed-in DataFrames?

@jreback jreback added Bug Reshaping Concat, Merge/Join, Stack/Unstack, Explode Sparse Sparse Data Type labels Jul 11, 2015
@jreback jreback added this to the 0.17.0 milestone Jul 11, 2015
@jreback
Copy link
Contributor

jreback commented Jul 11, 2015

xref #10536

@jreback
Copy link
Contributor

jreback commented Jul 11, 2015

I think #10536 should be fixed first. But this introduces the problem of when/how to convert a concat of series (that happen to be all SparseSeries) into a SparseDataFrame (as opposed to a DataFrame).

@artemyk
Copy link
Contributor Author

artemyk commented Jul 11, 2015

@jreback Not sure why #10536 needs to be fixed first --- it does not affect this issue (we are concating SparseDataFrames here)

@@ -957,13 +957,15 @@ def get_dummies(data, prefix=None, prefix_sep='_', dummy_na=False,
If `columns` is None then all the columns with
`object` or `category` dtype will be converted.
sparse : bool, default False
Whether the returned DataFrame should be sparse or not.
Whether the dummy columns should be sparse or not. Returns
SparseDataFrame if `data` is a Series or if all columns are included.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is confusing, I think we should just always return a SparseDataFrame.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about the case when some of the blocks are dense ? e.g.

In [1]: import pandas as pd

In [2]: df = pd.DataFrame({'a':['A','B','C'],'b':['D','E','D']})

In [3]: pd.get_dummies(df, sparse=True, columns='b')
Out[3]: 
   a  b_D  b_E
0  A    1    0
1  B    0    1
2  C    1    0

Here a is still dense. Is a df where some blocks are dense and some sparse more accurately called a DataFrame or a SparseDataFrame?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@artemyk artemyk force-pushed the sparse_dataframe branch from 7296068 to ec58429 Compare July 21, 2015 01:02
self.assertEqual(type(r[['a_0']]._data.blocks[0]), exp_blk_type)
self.assertEqual(type(r[['a_1']]._data.blocks[0]), exp_blk_type)
self.assertEqual(type(r[['a_2']]._data.blocks[0]), exp_blk_type)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jreback How about now?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok looks reasonable

ping when green

FYI - don't take lines out of the what's new I put them there on purpose - helps avoid merge conflicts (in fact u have one now because of that)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jreback OK , green now.
Sorry about the whatsnew newlines (though I don't think there's a merge conflict?) So, where is a good place to insert whatsnew notes? Somewhere between two newlines?

@jreback
Copy link
Contributor

jreback commented Jul 21, 2015

merged via d065374

thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Reshaping Concat, Merge/Join, Stack/Unstack, Explode Sparse Sparse Data Type
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants