Skip to content

Support dtypes other than float in sparse data structures #667

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wesm opened this issue Jan 23, 2012 · 5 comments · Fixed by #13849
Closed

Support dtypes other than float in sparse data structures #667

wesm opened this issue Jan 23, 2012 · 5 comments · Fixed by #13849
Labels
Enhancement Sparse Sparse Data Type Testing pandas testing functions or related to the test suite
Milestone

Comments

@wesm
Copy link
Member

wesm commented Jan 23, 2012

No description provided.

@jreback
Copy link
Contributor

jreback commented Jul 24, 2013

partially in #3482

@jreback jreback modified the milestones: 0.15.0, 0.14.0 Feb 15, 2014
@jreback jreback modified the milestones: 0.16.0, 0.17.0 Jan 26, 2015
@aolieman
Copy link

Would it be easier to work towards supporting floats other than float64 first? The overall enhancement of supporting other kinds of dtypes seems a major effort that should probably be tackled in smaller steps.

I'm particularly interested in reducing memory usage of dummy variables (i.e. bool) and small-valued counts (e.g. uint8). When sparsifying dataframes that contain such dtypes, being able to convert to float16 rather than float64 would already help a lot.

I've posted a StackOverflow question (and answer) regarding my attempts to achieve this.

@kawochen kawochen mentioned this issue Apr 10, 2016
18 tasks
@jreback
Copy link
Contributor

jreback commented Apr 10, 2016

@aolieman sparse already supports quite a few dtypes, and more in 0.18.1, see changes here

@aolieman
Copy link

@jreback thanks for mentioning the 0.18.1 fixes. They should solve some of the issues I encountered, but dtype coercion still occurs with frames. Even if a SparseDataFrame is directly constructed from a SparseSeries that has the desired dtype, the resulting frame always uses float64.

Attempts to construct a one-column frame in 0.18.0 (same result for multiple columns):

In []: dense_series = pd.Series([False]*5 + [True]*3 + [False]*5, dtype='bool', name='b')

In []: dense_df = pd.DataFrame(dense_series)

In []: sparse_df = dense_df.to_sparse(fill_value=False)

In []: sparse_df['b'].dtype
Out[]: dtype('float64')

In []: sparse_series = dense_series.to_sparse(fill_value=False)

In []: sparse_series.dtype
Out[]: dtype('bool')

In []: sparse_df = pd.SparseDataFrame(sparse_series)

In []: sparse_df['b'].dtype
Out[]: dtype('float64')

In []: sparse_df = pd.SparseDataFrame(sparse_series, dtype='bool')

In []: sparse_df['b'].dtype
Out[]: dtype('bool')

In []: sparse_df.info()
------------------------
[traceback omitted]
AttributeError: ("'SingleBlockManager' object has no attribute 'view'", 'occurred at index b')

In []: sparse_df['b'].values
Out[]:
SingleBlockManager
Items: RangeIndex(start=0, stop=13, step=1)
BoolBlock: 13 dtype: bool

My apologies if this is solved in 0.18.1 (which I'm not able to test right now) or if I'm doing it wrong.

@jreback
Copy link
Contributor

jreback commented Apr 11, 2016

well it's an open issue - welcome to have test and such

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement Sparse Sparse Data Type Testing pandas testing functions or related to the test suite
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants