Skip to content

BUG: setting a sparse column in a frame buggy #8131

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jreback opened this issue Aug 28, 2014 · 4 comments · Fixed by #8291
Closed

BUG: setting a sparse column in a frame buggy #8131

jreback opened this issue Aug 28, 2014 · 4 comments · Fixed by #8291
Labels
Bug Sparse Sparse Data Type
Milestone

Comments

@jreback
Copy link
Contributor

jreback commented Aug 28, 2014

from SO

thought this was well tested.....

df = pd.DataFrame({'c_1':['a', 'b', 'c'], 'n_1': [1., 2., 3.]})
df['new_column'] = pd.Series([0, 0, 1]).to_sparse(fill_value=0)
# AssertionError: Shape of new values must be compatible with manager shape
@jreback jreback added this to the 0.15.0 milestone Aug 28, 2014
@jreback jreback modified the milestones: 0.15.1, 0.15.0 Sep 9, 2014
@jreback
Copy link
Contributor Author

jreback commented Sep 9, 2014

cc @immerrr

@immerrr
Copy link
Contributor

immerrr commented Sep 10, 2014

Let me check that...

@immerrr
Copy link
Contributor

immerrr commented Sep 11, 2014

The issue is in df._sanitize_column that returns a 2d dense array with non-fill elements:

In [9]: df = pd.DataFrame({'c_1': list('abc')})

In [10]: sp_col = pd.Series([0,0,1]).to_sparse(fill_value=0)

In [11]: df._sanitize_column('n', sp_col)
Out[11]: array([[1]])

Hacking sanitize column is easy, but it uncovers yet another issue with ndarray subclassing:

In [54]: sp_arr = pd.SparseArray([0,0,1], fill_value=0)

In [55]: sp_arr
Out[55]: 
[0, 0, 1.0]
Fill: 0
IntIndex
Indices: array([2], dtype=int32)


In [56]: np.asarray(sp_arr)
Out[56]: array([ 1.])

This happens because np.asarray checks on C level that sp_arr provides PEP3118 buffer interface (which ndarray does) and uses that representation which contains only non-fill elements. Which is unfortunate because it can not be overridden by inheriting class on Python level (see python issue).

@jreback
Copy link
Contributor Author

jreback commented Sep 11, 2014

yep, prob SpareseArray just needs to be tested for (similar to what just did with Categorical),
needs to be passed thru directly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Sparse Sparse Data Type
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants