PERF: sparse take #43654

mzeitlin11 · 2021-09-19T03:02:14Z

This helps some on #41023 because __getitem__ uses take internally for SparseArray. However, the big gain there can follow after this to move towards not densifying on __getitem__ with a slice

Benchmarks:

       before           after         ratio
     [e7e7b407]       [4de1fe27]
     <master>         <perf_sparse_take>
-        72.7±6μs         38.8±4μs     0.53  sparse.Take.time_take(array([0]), False)
-     1.00±0.06ms          350±4μs     0.35  sparse.Take.time_take(array([-1, -1, -1, ..., -1, -1, -1]), False)
-        737±20μs          206±6μs     0.28  sparse.Take.time_take(array([    0,     1,     2, ..., 99997, 99998, 99999]), False)

mzeitlin11 · 2021-09-19T03:03:49Z

pandas/tests/arrays/sparse/test_array.py

        tm.assert_sp_array_equal(result, expected)

        result = sparse.take(np.array([1, 0, -1]), fill_value=True)
-        expected = SparseArray([np.nan, np.nan, np.nan], kind="block")
+        expected = SparseArray([np.nan, np.nan, np.nan], kind=kind)


In all other take cases, we preserve the kind of SparseIndex, so I think it makes sense to do so here (otherwise is surprising value-dependent behavior IMO, somewhat expressed in the comment originally here).

mzeitlin11 · 2021-09-19T03:05:21Z

pandas/core/arrays/sparse/array.py


-        return taken
+        new_sp_index = make_sparse_index(len(indices), value_indices, kind=self.kind)
+        return type(self)._simple_new(new_sp_values, new_sp_index, dtype=self.dtype)


Main reasoning for the simplifications here keeping same behavior is that take with allow_fill=False should give a result with the same dtype as our SparseArray

jreback · 2021-09-20T16:44:11Z

lgtm. cc @jbrockmendel

jbrockmendel · 2021-09-20T17:00:35Z

doc/source/whatsnew/v1.4.0.rst

@@ -354,6 +354,7 @@ Performance improvements
 - Performance improvement in indexing with a :class:`MultiIndex` indexer on another :class:`MultiIndex` (:issue:43370`)
 - Performance improvement in :meth:`GroupBy.quantile` (:issue:`43469`)
 - :meth:`SparseArray.min` and :meth:`SparseArray.max` no longer require converting to a dense array (:issue:`43526`)
+- Performance improvement in :meth:`SparseArray.take` with ``allow_fill=False`` (:issue:`?`)


pandas/core/arrays/sparse/array.py

doc/source/whatsnew/v1.4.0.rst

jreback · 2021-09-21T14:08:52Z

thanks @mzeitlin11

mzeitlin11 added 4 commits September 18, 2021 21:56

Clean up casting

c54b60a

Add benchmark and whatsnew

3a6c3e4

Avoid copy

daadddc

Clean up spacing

4de1fe2

mzeitlin11 added Indexing Related to indexing on series/frames, not to indexes themselves Performance Memory or execution speed performance Sparse Sparse Data Type labels Sep 19, 2021

mzeitlin11 commented Sep 19, 2021

View reviewed changes

Merge remote-tracking branch 'upstream/master' into perf_sparse_take

9ceb4fa

jreback added this to the 1.4 milestone Sep 20, 2021

jbrockmendel reviewed Sep 20, 2021

View reviewed changes

pandas/core/arrays/sparse/array.py Show resolved Hide resolved

mzeitlin11 commented Sep 20, 2021

View reviewed changes

doc/source/whatsnew/v1.4.0.rst Outdated Show resolved Hide resolved

Update doc/source/whatsnew/v1.4.0.rst

58efeb5

jreback approved these changes Sep 21, 2021

View reviewed changes

jreback merged commit a5264b8 into pandas-dev:master Sep 21, 2021

mzeitlin11 deleted the perf_sparse_take branch September 21, 2021 16:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PERF: sparse take #43654

PERF: sparse take #43654

mzeitlin11 commented Sep 19, 2021 •

edited

Loading

mzeitlin11 Sep 19, 2021

mzeitlin11 Sep 19, 2021

jreback commented Sep 20, 2021

jbrockmendel Sep 20, 2021

mzeitlin11 Sep 20, 2021

jreback commented Sep 21, 2021

PERF: sparse take #43654

PERF: sparse take #43654

Conversation

mzeitlin11 commented Sep 19, 2021 • edited Loading

mzeitlin11 Sep 19, 2021

Choose a reason for hiding this comment

mzeitlin11 Sep 19, 2021

Choose a reason for hiding this comment

jreback commented Sep 20, 2021

jbrockmendel Sep 20, 2021

Choose a reason for hiding this comment

mzeitlin11 Sep 20, 2021

Choose a reason for hiding this comment

jreback commented Sep 21, 2021

mzeitlin11 commented Sep 19, 2021 •

edited

Loading