API: DataFrame.sparse accessor #25682

TomAugspurger · 2019-03-12T03:27:15Z

TomAugspurger

Would welcome feedback from sparse users and whether I'm missing any functionality from SparseDataFrame. But I think this covers it.

Will follow up with a deprecation for SparseDataFrame and SparseSeries later in the week.

pandas/core/arrays/sparse.py

TomAugspurger · 2019-03-12T03:32:38Z

pandas/core/arrays/sparse.py

+        Ratio of non-sparse points to total (dense) data points
+        represented in the DataFrame.
+        """
+        return np.mean([column.array.density


Would not taking the mean, and returning a Series instead, be more useful?

codecov · 2019-03-12T04:07:20Z

Codecov Report

Merging #25682 into master will increase coverage by <.01%.
The diff coverage is 93.42%.

@@            Coverage Diff             @@
##           master   #25682      +/-   ##
==========================================
+ Coverage   91.29%   91.29%   +<.01%     
==========================================
  Files         173      173              
  Lines       52961    53013      +52     
==========================================
+ Hits        48349    48398      +49     
- Misses       4612     4615       +3

Flag	Coverage Δ
#multiple	`89.87% <93.42%> (ø)`	⬆️
#single	`41.72% <22.36%> (-0.02%)`	⬇️

Impacted Files	Coverage Δ
pandas/core/sparse/frame.py	`96% <100%> (+0.29%)`	⬆️
pandas/core/frame.py	`96.79% <100%> (ø)`	⬆️
pandas/core/arrays/sparse.py	`92.19% <92.85%> (+0.01%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 21769e9...24f48c3. Read the comment docs.

codecov · 2019-03-12T04:07:20Z

Codecov Report

Merging #25682 into master will decrease coverage by <.01%.
The diff coverage is 97.75%.

@@            Coverage Diff             @@
##           master   #25682      +/-   ##
==========================================
- Coverage   91.68%   91.68%   -0.01%     
==========================================
  Files         174      174              
  Lines       50704    50747      +43     
==========================================
+ Hits        46489    46528      +39     
- Misses       4215     4219       +4

Flag	Coverage Δ
#multiple	`90.19% <97.75%> (ø)`	⬆️
#single	`41.19% <25.84%> (-0.14%)`	⬇️

Impacted Files	Coverage Δ
pandas/core/sparse/frame.py	`95.63% <100%> (+0.14%)`	⬆️
pandas/core/frame.py	`97.02% <100%> (-0.12%)`	⬇️
pandas/core/arrays/sparse.py	`92.7% <97.46%> (+0.39%)`	⬆️
pandas/io/gbq.py	`78.94% <0%> (-10.53%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2123a96...f23fa52. Read the comment docs.

…ssor

jreback

lgtm. small test question & anyway to shared doc-strings?

jreback · 2019-03-20T01:09:26Z

pandas/tests/arrays/sparse/test_accessor.py

+    @pytest.mark.parametrize('dtype', ['float64', 'int64'])
+    @td.skip_if_no_scipy
+    def test_from_spmatrix(self, format, labels, dtype):
+        import scipy.sparse


hmm shouldn't we move the scipy specific test to a new file then just pyimportorskip at the top?

My preference to keep all the accessor tests in a single file / class.

jreback · 2019-04-05T00:55:57Z

@TomAugspurger if you can merge master

TomAugspurger · 2019-04-05T20:16:03Z

Won't have time in the near term. I'm not sure which docstrings you were suggesting to share. I'd prefer to have all the accessors tests in a single place so I'll push back against the suggestion to split the SciPy ones off.

…

On Thu, Apr 4, 2019 at 7:56 PM Jeff Reback ***@***.***> wrote: @TomAugspurger <https://github.com/TomAugspurger> if you can merge master — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#25682 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABQHItV7eEh7t5DaqfKzxSFQ7kulfKkcks5vdp8hgaJpZM4bp_Wv> .

…ssor

TomAugspurger · 2019-04-18T18:27:43Z

Ping, assuming CI passes.

WillAyd

Minor comments - just looking at the implementation to familiarize myself with it

WillAyd · 2019-04-19T19:01:35Z

pandas/core/arrays/sparse.py

+            SparseArray.from_spmatrix(data[:, i])
+            for i in range(data.shape[1])
+        ]
+        data = dict(zip(columns, sparrays))


Not sure how often we use this construction but I assume this preclude a user from specifying a MI or anything with duplicated index entries due to hashability / uniqueness constraints of dict keys

Fair point.

I'd like to avoid the perf issue with passing columns= to the DataFrame constructor... I suppose our alternative is to just set .columns after creating the DataFrame?

Wasn't aware of the perf issue - is there an open issue for that?

Yea think assigning directly would be a better approach

pandas/core/arrays/sparse.py

jreback · 2019-04-19T20:04:26Z

pandas/core/arrays/sparse.py

+        from pandas import DataFrame
+
+        data = {k: v.array.to_dense()
+                for k, v in self._parent.iteritems()}


iteritems() -> items()

jreback · 2019-04-19T20:05:41Z

pandas/core/arrays/sparse.py

+        represented in the DataFrame.
+        """
+        return np.mean([column.array.density
+                        for _, column in self._parent.iteritems()])


use items()

jreback · 2019-04-19T20:14:04Z

pandas/core/arrays/sparse.py

+
+    @classmethod
+    def from_spmatrix(cls, data, index=None, columns=None):
+        """


I am assuming you are defining this here because then we can simply deprecate SparseDataFrame as this is much simpler / direct?

Right, this is the replacement for SparseDataFrame(sp_matrix).

jreback · 2019-04-19T20:14:42Z

pandas/core/arrays/sparse.py

+        """
+        Return the contents of the frame as a sparse SciPy COO matrix.
+
+        .. versionadded:: 0.20.0


i think update to 0.25.0, because even though not new, this is defined in a new place

jreback · 2019-04-19T20:15:14Z

pandas/core/arrays/sparse.py

+        return coo_matrix((datas, (rows, cols)), shape=self._parent.shape)
+
+    @property
+    def density(self):


If you can add type annotations anywhere it is easy would be nice.

jreback · 2019-04-19T20:17:16Z

pandas/tests/arrays/sparse/test_array.py

+                                              reason='NumPy-11383')),
+        10
+    ])
+    def test_from_spmatrix(self, size, format):


use @td.skip_if_no_scipy

pandas/tests/arrays/sparse/test_array.py

…ssor

TomAugspurger · 2019-04-29T15:32:32Z

Pushed a fix for column handling.

I don’t plan to spend more time on this. If it isn’t ready, I’d suggest a doc-only deprecation of sparse series and sparse DataFrame.

jreback · 2019-05-07T01:27:04Z

@TomAugspurger this looked fine, just a couple of questions above. (and merge master);

jreback · 2019-05-12T21:04:29Z

@TomAugspurger just a few questions

…ssor

TomAugspurger · 2019-05-14T13:18:58Z

All green. I'll update #26137 once this is in.

jreback · 2019-05-14T14:13:49Z

thanks @TomAugspurger

API: DataFrame.sparse accessor

24f48c3

Closes pandas-dev#25681

TomAugspurger added the Sparse Sparse Data Type label Mar 12, 2019

TomAugspurger added this to the 0.25.0 milestone Mar 12, 2019

TomAugspurger commented Mar 12, 2019

View reviewed changes

TomAugspurger added 9 commits March 12, 2019 13:38

32-bit compat

6f619b5

fixups

94a7baf

Merge remote-tracking branch 'upstream/master' into sparse-frame-acce…

534a379

…ssor

Merge remote-tracking branch 'upstream/master' into sparse-frame-acce…

6696f28

…ssor

lint

f433be8

updates

0922296

Merge remote-tracking branch 'upstream/master' into sparse-frame-acce…

318c06f

…ssor

isort?

3005aed

Merge remote-tracking branch 'upstream/master' into sparse-frame-acce…

8b136bf

…ssor

jreback requested changes Mar 20, 2019

View reviewed changes

Merge remote-tracking branch 'upstream/master' into sparse-frame-acce…

57c884e

…ssor

TomAugspurger mentioned this pull request Apr 18, 2019

Deprecate SparseDataFrame and SparseSeries #26137

Merged

TomAugspurger added 2 commits April 18, 2019 14:07

compat

9cbcccd

compat

663a87e

WillAyd reviewed Apr 19, 2019

View reviewed changes

lint

3f6a5aa

jreback requested changes Apr 19, 2019

View reviewed changes

TomAugspurger added 3 commits April 29, 2019 10:14

Merge remote-tracking branch 'upstream/master' into sparse-frame-acce…

945531c

…ssor

special columns

8a46ef4

fixup

727625e

fixup

5890c28

TomAugspurger added 3 commits May 13, 2019 09:23

Merge remote-tracking branch 'upstream/master' into sparse-frame-acce…

ed5b22a

…ssor

Merge remote-tracking branch 'upstream/master' into sparse-frame-acce…

b803f88

…ssor

fixups

f23fa52

jreback approved these changes May 14, 2019

View reviewed changes

jreback merged commit 0558a3c into pandas-dev:master May 14, 2019

Uh oh!

API: DataFrame.sparse accessor #25682

API: DataFrame.sparse accessor #25682

Uh oh!

Conversation

TomAugspurger commented Mar 12, 2019

Uh oh!

TomAugspurger left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Mar 12, 2019

Codecov Report

Uh oh!

codecov bot commented Mar 12, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

jreback left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jreback commented Apr 5, 2019

Uh oh!

TomAugspurger commented Apr 5, 2019 via email

Uh oh!

TomAugspurger commented Apr 18, 2019

Uh oh!

WillAyd left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

TomAugspurger commented Apr 29, 2019

Uh oh!

jreback commented May 7, 2019

Uh oh!

jreback commented May 12, 2019

Uh oh!

TomAugspurger commented May 14, 2019

Uh oh!

jreback commented May 14, 2019

Uh oh!

Uh oh!

codecov bot commented Mar 12, 2019 •

edited

Loading