SparseDataFrame should be able to handle also non float non sparse Columns #2873

benjello · 2013-02-14T16:23:36Z

dup of #667
I use pandas version 0.10.1

When trying to build a DataFrame with columns that can equally be of the type int/float/str
and be sparse or not I ran into trouble.

from pandas import DataFrame
x = DataFrame(randn(10000, 2), columns = ['a', 'b'])
y = DataFrame(randn(10000, 2), columns = ['c', 'd'])

import random
import string
z = DataFrame( [ random.choice(string.letters) for i in range(10000)], columns = ['e'])
x.ix[:9998] = 0
x = x.to_sparse(fill_value=0)

print x.density
print y.__class__
df = concat([x, y])

works fine. But the following doesn't:

df2 = concat([x, y, z])

Also, the following example doesn't yield to a SparseDataFrame:

df = DataFrame(randn(10000, 4))
df.ix[:9998] = 0

df1 = df.to_sparse(fill_value=0)
print df1.density

df[0] = df[0].to_sparse(fill_value=0)
df[1] = df[1].to_sparse(fill_value=0) 
print df[1].__class__
print df.__class__
print SparseDataFrame(df[0]).density

but this might be a feature since modifying df[0] should not modify df.

Finally the following doesn't yield to a SparseDataFrame

x = Series(randn(10000), name='a')
x = x.to_sparse(fill_value=0)
print x.__class__
df = SparseDataFrame(x)
print df.__class__

I would be very happy to use the power of pandas to deal with sparse structures.
So my last item is a question: Is the development of sparse objects a priority of the pandas project team ?

Thank you for your for providing such a nice piece of software

The text was updated successfully, but these errors were encountered:

benjello · 2013-02-14T17:10:16Z

In the same vein, the following code yields an error too:

x = Series(randn(10000), name = 'a')
y = Series(randn(10000), name ='b')
x.ix[:9998] = 0
x = x.to_sparse(fill_value=0)
df1 = SparseDataFrame([x, y])

jreback · 2013-02-14T20:16:20Z

your last example in your initial comment doesn''t produce an error (e.g. df is a SparseDataFrame)
??

benjello · 2013-02-15T09:12:48Z

You are right and I a deeply sorry for that.
Here is another problem that might be related to #2803. The densities should be the same but the first one is wrong.

x = Series(randn(10000), name='a')
x = x.to_sparse(fill_value=0)
print x.__class__
df = SparseDataFrame(x)
print df.__class__
print df.density


y = Series(randn(10000), name='a')
y.ix[:9998] = 0
y = DataFrame(y).to_sparse(fill_value=0)
print y.__class__
df1 = SparseDataFrame(y)
print df1.__class__
print df1.density

jreback · 2013-02-15T22:12:30Z

if I understand your first example your want to have a sparse frame that also has dense columns (possibly of a different dtype)?

benjello · 2013-02-15T22:36:14Z

Yes. One could need a dataframe where only some columns are worth to be used as sparse columns. I would like to be able to sparsify specific columns. I do not know if it is compatible with pandas internals.

jreback · 2013-07-24T14:13:49Z

partially in #3482

jreback · 2013-09-20T23:01:08Z

dup of #667

jreback mentioned this issue Feb 14, 2013

BUG: issue in get_dtype_counts for SparseDataFrame (introduced by dtypes) #2875

Merged

jreback closed this as completed Sep 20, 2013

benjello mentioned this issue Feb 28, 2014

Better handling of memory usage openfisca/openfisca-core#92

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SparseDataFrame should be able to handle also non float non sparse Columns #2873

SparseDataFrame should be able to handle also non float non sparse Columns #2873

benjello commented Feb 14, 2013

benjello commented Feb 14, 2013

jreback commented Feb 14, 2013

benjello commented Feb 15, 2013

jreback commented Feb 15, 2013

benjello commented Feb 15, 2013

jreback commented Jul 24, 2013

jreback commented Sep 20, 2013

SparseDataFrame should be able to handle also non float non sparse Columns #2873

SparseDataFrame should be able to handle also non float non sparse Columns #2873

Comments

benjello commented Feb 14, 2013

benjello commented Feb 14, 2013

jreback commented Feb 14, 2013

benjello commented Feb 15, 2013

jreback commented Feb 15, 2013

benjello commented Feb 15, 2013

jreback commented Jul 24, 2013

jreback commented Sep 20, 2013