-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: Fix wrong reading sparse matrix #31991
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: Fix wrong reading sparse matrix #31991
Conversation
Previously the comment in #29814 had suggested using One of my use cases involves taking the difference of two sparse matrices, and then passing the result to scipy.sparse and pandas (ideally filling missing with NaN, instead of 0). There's a meaningful difference between "this element has no information from either matrix" (NaN) and "this element had the exact same value in both matrices" (0). It looks to me like this PR would obscure that distinction. This would also make it hard to round-trip from Is there a solution that doesn't involve losing the information about explicit zeros? (This wouldn't be a show-stopper for us; right now export to pandas is basically a convenience for users who want it. We'd just have to add a warning on that function.) |
@jorisvandenbossche or @TomAugspurger any thoughts here? |
What’s the performance cost of your PR be using indicies? It seems like stripping explicit zeros will require scanning the full data? |
@TomAugspurger |
@m7142yosuke as I also mentioned on the issue, there has been a PR that refactored the This seems to have fixed the DataFrame bug (it has its own logic now, and does not use SparseArray.from_spmatrix under the hood), but so the SparseArray is still buggy (what you are fixing here). Also adding a test case for the original case from #29814 would be welcome |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you update to the comments on the PR, e.g. @jorisvandenbossche mentioned an explicit test & add a whatsnew note in 1.1., sparse bug fix section. pls merge master and ping on green.
@jorisvandenbossche @jreback Done.
|
thanks @m7142yosuke |
black pandas
git diff upstream/master -u -- "*.py" | flake8 --diff