Skip to content

SparseDataFrame should be able to handle also non float non sparse Columns #2873

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
benjello opened this issue Feb 14, 2013 · 7 comments
Closed
Labels
Duplicate Report Duplicate issue or pull request Enhancement Sparse Sparse Data Type
Milestone

Comments

@benjello
Copy link
Contributor

dup of #667
I use pandas version 0.10.1

When trying to build a DataFrame with columns that can equally be of the type int/float/str
and be sparse or not I ran into trouble.

from pandas import DataFrame
x = DataFrame(randn(10000, 2), columns = ['a', 'b'])
y = DataFrame(randn(10000, 2), columns = ['c', 'd'])

import random
import string
z = DataFrame( [ random.choice(string.letters) for i in range(10000)], columns = ['e'])
x.ix[:9998] = 0
x = x.to_sparse(fill_value=0)

print x.density
print y.__class__
df = concat([x, y])

works fine. But the following doesn't:

df2 = concat([x, y, z])

Also, the following example doesn't yield to a SparseDataFrame:

df = DataFrame(randn(10000, 4))
df.ix[:9998] = 0

df1 = df.to_sparse(fill_value=0)
print df1.density

df[0] = df[0].to_sparse(fill_value=0)
df[1] = df[1].to_sparse(fill_value=0) 
print df[1].__class__
print df.__class__
print SparseDataFrame(df[0]).density

but this might be a feature since modifying df[0] should not modify df.

Finally the following doesn't yield to a SparseDataFrame

x = Series(randn(10000), name='a')
x = x.to_sparse(fill_value=0)
print x.__class__
df = SparseDataFrame(x)
print df.__class__

I would be very happy to use the power of pandas to deal with sparse structures.
So my last item is a question: Is the development of sparse objects a priority of the pandas project team ?

Thank you for your for providing such a nice piece of software

@benjello
Copy link
Contributor Author

In the same vein, the following code yields an error too:

x = Series(randn(10000), name = 'a')
y = Series(randn(10000), name ='b')
x.ix[:9998] = 0
x = x.to_sparse(fill_value=0)
df1 = SparseDataFrame([x, y])

@jreback
Copy link
Contributor

jreback commented Feb 14, 2013

your last example in your initial comment doesn''t produce an error (e.g. df is a SparseDataFrame)
??

@benjello
Copy link
Contributor Author

You are right and I a deeply sorry for that.
Here is another problem that might be related to #2803. The densities should be the same but the first one is wrong.

x = Series(randn(10000), name='a')
x = x.to_sparse(fill_value=0)
print x.__class__
df = SparseDataFrame(x)
print df.__class__
print df.density


y = Series(randn(10000), name='a')
y.ix[:9998] = 0
y = DataFrame(y).to_sparse(fill_value=0)
print y.__class__
df1 = SparseDataFrame(y)
print df1.__class__
print df1.density

@jreback
Copy link
Contributor

jreback commented Feb 15, 2013

if I understand your first example your want to have a sparse frame that also has dense columns (possibly of a different dtype)?

@benjello
Copy link
Contributor Author

Yes. One could need a dataframe where only some columns are worth to be used as sparse columns. I would like to be able to sparsify specific columns. I do not know if it is compatible with pandas internals.

@jreback
Copy link
Contributor

jreback commented Jul 24, 2013

partially in #3482

@jreback
Copy link
Contributor

jreback commented Sep 20, 2013

dup of #667

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Duplicate Report Duplicate issue or pull request Enhancement Sparse Sparse Data Type
Projects
None yet
Development

No branches or pull requests

2 participants