Skip to content

SparseDataFrame.to_coo does not convert the default fill value when is not 0 #24817

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
gm-spacagna opened this issue Jan 17, 2019 · 10 comments · Fixed by #43763
Closed

SparseDataFrame.to_coo does not convert the default fill value when is not 0 #24817

gm-spacagna opened this issue Jan 17, 2019 · 10 comments · Fixed by #43763
Assignees
Labels
Enhancement Error Reporting Incorrect or improved errors from pandas good first issue Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Sparse Sparse Data Type
Milestone

Comments

@gm-spacagna
Copy link

gm-spacagna commented Jan 17, 2019

DataFrame.sparse.to_coo() should raise when the fill_value is not zero, since scipy.sparse only supports filling with 0.

In [21]: df = pd.DataFrame({"A": pd.SparseArray([1, 1, 1, 2], fill_value=1)})

In [22]: df.sparse.to_coo().todense()
Out[22]:
matrix([[0],
        [0],
        [0],
        [2]])

That's incorrect. It should instead raise a ValueError with a nice message.

@TomAugspurger
Copy link
Contributor

TomAugspurger commented Jan 17, 2019

Can you shorten the example to remove the unnecessary lines?

Can you show the actual output and the expected output?

Finally, can you try on 0.24.0rc1? I don't recall if these were fixed or not, but sparse got an overhaul.

pip install --pre pandas or conda install -c conda-forge/label/rc pandas in a new env.

@gm-spacagna
Copy link
Author

Can you shorten the example to remove the unnecessary lines?

Can you show the actual output and the expected output?

Finally, can you try on 0.24.0rc1? I don't recall if these were fixed or not, but sparse got an overhaul.

pip install --pre pandas or conda install -c conda-forge/label/rc pandas in a new env.

I have updated to 0.24.0rc1 and added the output of each block. The problem persists.

@gm-spacagna
Copy link
Author

gm-spacagna commented Jan 18, 2019

I guess the right behaviour of fillna on a sparse dataframe should be sparse_df_filled =sparse_df.to_dense().fillna(-1).to_sparse() or we should not support a fill_value different from zero, just like scipy.

@gfyoung gfyoung added Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Sparse Sparse Data Type labels Jan 21, 2019
@jorisvandenbossche
Copy link
Member

To avoid surprises, I would raise an informative error when the fill value is not zero, pointing people towards doing changing the fill value before converting to scipy (since scipy simply does not support any other fill value).

Ideally we do this before 0.25, and only for the DataFrame.sparse.to_coo version, since that is still new.

@jorisvandenbossche jorisvandenbossche added this to the 0.25.0 milestone May 29, 2019
@jreback jreback modified the milestones: 0.25.0, Contributions Welcome Jun 28, 2019
@jorisvandenbossche jorisvandenbossche modified the milestones: Contributions Welcome, 1.0 Sep 16, 2019
@TomAugspurger TomAugspurger modified the milestones: 1.0, Contributions Welcome Dec 30, 2019
@TomAugspurger
Copy link
Contributor

Moving this off 1.0, but I've updated the original post with a simplified description. This should be a good first isssue.

@shang-vikas
Copy link

shang-vikas commented Mar 5, 2020

Hi, I want to help out with this issue. Can i take it up? This would be my first open source contribution

@SurajH1
Copy link
Contributor

SurajH1 commented Apr 1, 2020

take

1 similar comment
@SurajH1
Copy link
Contributor

SurajH1 commented Apr 2, 2020

take

@devjeetr
Copy link
Contributor

take

@mroeschke mroeschke added the Error Reporting Incorrect or improved errors from pandas label Apr 25, 2020
@saehuihwang
Copy link
Contributor

take

@jreback jreback modified the milestones: Contributions Welcome, 1.4 Sep 28, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement Error Reporting Incorrect or improved errors from pandas good first issue Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Sparse Sparse Data Type
Projects
None yet
Development

Successfully merging a pull request may close this issue.

10 participants