-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Support dtypes other than float in sparse data structures #667
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
partially in #3482 |
Would it be easier to work towards supporting floats other than float64 first? The overall enhancement of supporting other kinds of dtypes seems a major effort that should probably be tackled in smaller steps. I'm particularly interested in reducing memory usage of dummy variables (i.e. bool) and small-valued counts (e.g. uint8). When sparsifying dataframes that contain such dtypes, being able to convert to float16 rather than float64 would already help a lot. I've posted a StackOverflow question (and answer) regarding my attempts to achieve this. |
@jreback thanks for mentioning the 0.18.1 fixes. They should solve some of the issues I encountered, but dtype coercion still occurs with frames. Even if a Attempts to construct a one-column frame in 0.18.0 (same result for multiple columns): In []: dense_series = pd.Series([False]*5 + [True]*3 + [False]*5, dtype='bool', name='b')
In []: dense_df = pd.DataFrame(dense_series)
In []: sparse_df = dense_df.to_sparse(fill_value=False)
In []: sparse_df['b'].dtype
Out[]: dtype('float64')
In []: sparse_series = dense_series.to_sparse(fill_value=False)
In []: sparse_series.dtype
Out[]: dtype('bool')
In []: sparse_df = pd.SparseDataFrame(sparse_series)
In []: sparse_df['b'].dtype
Out[]: dtype('float64')
In []: sparse_df = pd.SparseDataFrame(sparse_series, dtype='bool')
In []: sparse_df['b'].dtype
Out[]: dtype('bool')
In []: sparse_df.info()
------------------------
[traceback omitted]
AttributeError: ("'SingleBlockManager' object has no attribute 'view'", 'occurred at index b')
In []: sparse_df['b'].values
Out[]:
SingleBlockManager
Items: RangeIndex(start=0, stop=13, step=1)
BoolBlock: 13 dtype: bool My apologies if this is solved in 0.18.1 (which I'm not able to test right now) or if I'm doing it wrong. |
well it's an open issue - welcome to have test and such |
* added configuration documentation
No description provided.
The text was updated successfully, but these errors were encountered: