PERF: skip non-consolidatable blocks when checking consolidation #32826

jorisvandenbossche · 2020-03-19T13:51:36Z

This skips the non-consolidatable blocks to determine if the blocks are consolidated. So meaning that if you have multiple EA columns with the same dtype, we do not do an unnecessary consolidation.

From investigating #32196 (comment)

@rth this should give another speed-up to your benchmark

jorisvandenbossche · 2020-03-19T13:53:00Z

With:

arrays = [pd.arrays.SparseArray(np.random.randint(0, 2, 1000), dtype="float64") for _ in range(10000)]
index = pd.Index(range(len(arrays[0])))  
columns = pd.Index(range(len(arrays)))

it gives

In [4]: %timeit pd.DataFrame._from_arrays(arrays, index=index, columns=columns)  
122 ms ± 2.92 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)   <-- PR
249 ms ± 8.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)   <-- master

rth · 2020-03-19T14:19:27Z

Very nice @jorisvandenbossche , thanks!

I can confirm that when cherry-picked on top of #32825 this gives another 2x speed-up,

[100.00%] ··· sparse.SparseDataFrameConstructor.time_from_scipy                                                               115±0.5ms
       before           after         ratio
     [dbd7a5d3]       [c2476380]
     <master>         <tmp-sparse>
-       115±0.5ms       14.1±0.2ms     0.12  sparse.SparseDataFrameConstructor.time_from_scipy

So around 14x speed-up for these 3 PRs.

jbrockmendel · 2020-03-19T16:01:23Z

Looks like a nice speedup. Couple of questions, non-blockers:

are the improvements specific to SparseArray?
if we exclude non-consolidateable, can we check dtype instead of ftype?
does caching ftype make a difference?

jorisvandenbossche · 2020-03-19T16:12:41Z

Not specific to SparseArray, it's for all non-consolidatable blocks (which is now just ExtensionBlock I think?)
It's just that creating a DataFrame from sparse matrix is a case where you only have extension blocks, and thus were this is relatively more costly.

I suppose that we can get rid of ftype entirely, as indeed, that was only used in the past to differentiate dense and sparse dtypes, but now sparse dtypes are ExtensionDtypes, and not numpy dtypes, so we don't need that anymore.

jorisvandenbossche · 2020-03-19T16:20:49Z

@jbrockmendel updated with removing ftype, thanks for the questions!

WillAyd

lgtm - nice cleanup

jreback · 2020-03-19T20:16:45Z

thanks @jorisvandenbossche

yeah we had removed ftypes in 1.0, so this was a leftover i think.

…das-dev#32826)

PERF: skip non-consolidatable blocks when checking consolidation

f78621a

jorisvandenbossche added the Performance Memory or execution speed performance label Mar 19, 2020

jorisvandenbossche mentioned this pull request Mar 19, 2020

PERF: optimize DataFrame.sparse.from_spmatrix performance #32825

Merged

simonjayhawkins added this to the 1.1 milestone Mar 19, 2020

remove ftypes

72c01e0

WillAyd approved these changes Mar 19, 2020

View reviewed changes

jreback merged commit dbd24ad into pandas-dev:master Mar 19, 2020

jorisvandenbossche mentioned this pull request Mar 20, 2020

PERF: faster placement creating extension blocks from arrays #32856

Merged

jorisvandenbossche deleted the perf-consolidate branch March 20, 2020 10:30

SeeminSyed pushed a commit to CSCD01-team01/pandas that referenced this pull request Mar 22, 2020

PERF: skip non-consolidatable blocks when checking consolidation (pan…

aa99c68

…das-dev#32826)

jbrockmendel pushed a commit to jbrockmendel/pandas that referenced this pull request Mar 23, 2020

PERF: skip non-consolidatable blocks when checking consolidation (pan…

9c26a2b

…das-dev#32826)

jbrockmendel pushed a commit to jbrockmendel/pandas that referenced this pull request Mar 25, 2020

PERF: skip non-consolidatable blocks when checking consolidation (pan…

563da98

…das-dev#32826)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

PERF: skip non-consolidatable blocks when checking consolidation #32826

PERF: skip non-consolidatable blocks when checking consolidation #32826

Uh oh!

jorisvandenbossche commented Mar 19, 2020

Uh oh!

jorisvandenbossche commented Mar 19, 2020

Uh oh!

rth commented Mar 19, 2020

Uh oh!

jbrockmendel commented Mar 19, 2020

Uh oh!

jorisvandenbossche commented Mar 19, 2020

Uh oh!

jorisvandenbossche commented Mar 19, 2020

Uh oh!

WillAyd left a comment

Uh oh!

jreback commented Mar 19, 2020

Uh oh!

Uh oh!

Uh oh!

PERF: skip non-consolidatable blocks when checking consolidation #32826

PERF: skip non-consolidatable blocks when checking consolidation #32826

Uh oh!

Conversation

jorisvandenbossche commented Mar 19, 2020

Uh oh!

jorisvandenbossche commented Mar 19, 2020

Uh oh!

rth commented Mar 19, 2020

Uh oh!

jbrockmendel commented Mar 19, 2020

Uh oh!

jorisvandenbossche commented Mar 19, 2020

Uh oh!

jorisvandenbossche commented Mar 19, 2020

Uh oh!

WillAyd left a comment

Choose a reason for hiding this comment

Uh oh!

jreback commented Mar 19, 2020

Uh oh!

Uh oh!