PERF: Faster SparseArray.__getitem__ for boolean SparseArray mask (#23122) #44841
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Usage of direct indexing is working very fast, but calculating setdiff1d is working very slow, the good solution for both case (much faster than setdiff1d and bit slower than direct indices) using masks. Here is the plot of comparison old (densifying) implementation and new one (masks):
I understand that using mask will take more memory, but I don't think that it is crucial point here especially for bool8 type, but let's discuss.
We can combine both approaches and use direct indices when fill value is False and use mask in the opposite case.
One more question to discuss related with case when True value of Boolean sparse array lead to the fill value of original array. I assume that in this case fill value should be return.
I've got benchmark results via pytest-benchmark framework.
Code for benchmarking: