PERF: is_bool_indexer #41861

jbrockmendel · 2021-06-08T00:53:14Z

Besides avoiding an array allocation, thus avoids the np.asarray call that was one of the main offenders in #41632

WillAyd · 2021-06-08T05:54:51Z

pandas/_libs/lib.pyx

+
+    for item in obj:
+        if not util.is_bool_object(item):
+            return False


Is there any concern with the benchmarks being somewhat misleading due to short circuiting? At least in the Python space similar functions don't seem to benefit much from this

In [6]: %timeit all(isinstance(x, bool) for x in obj) 6.7 µs ± 136 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) In [7]: %timeit np.array(obj).dtype.kind == "b" 5.98 µs ± 121 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

Is there any concern with the benchmarks being somewhat misleading due to short circuiting?

Only one of the 4 cases posted short-circuits. Am I misunderstanding the question?

At least in the Python space similar functions don't seem to benefit much from this

I was surprised by this too, found that doing it in cython was much more performant after talking with @seberg yesterday about how numpy does this inference.

Wow, the difference is big...

NumPy should maybe try to be a bit faster, but that will just make the gap a bit smaller (e.g. by assuming that usually all elements are the same type, I guess a true casting support np.asarray(..., dtype=bool, casting="equiv") would fill the gap, but no idea how likely that would be). Unlike NumPy, you already know what you want here (bools) and what you got (a 1-d list).

Ah my bad on short circuiting - misread it. Thanks for comments

jreback · 2021-06-08T20:45:41Z

thanks this looks good

jbrockmendel added 3 commits June 7, 2021 16:18

PERF: is_bool_list

bcd74fa

typo fixup

f8a6f2a

mypy fixup

808a82f

WillAyd reviewed Jun 8, 2021

View reviewed changes

WillAyd added the Performance Memory or execution speed performance label Jun 8, 2021

Merge branch 'master' into ref-infer

63948a0

WillAyd approved these changes Jun 8, 2021

View reviewed changes

jreback added this to the 1.3 milestone Jun 8, 2021

jreback merged commit 017ff7c into pandas-dev:master Jun 8, 2021

jbrockmendel deleted the ref-infer branch June 8, 2021 20:48

JulianWgs pushed a commit to JulianWgs/pandas that referenced this pull request Jul 3, 2021

PERF: is_bool_indexer (pandas-dev#41861)

4a8b6e5

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PERF: is_bool_indexer #41861

PERF: is_bool_indexer #41861

jbrockmendel commented Jun 8, 2021

WillAyd Jun 8, 2021

jbrockmendel Jun 8, 2021

seberg Jun 8, 2021

WillAyd Jun 8, 2021

jreback commented Jun 8, 2021

PERF: is_bool_indexer #41861

PERF: is_bool_indexer #41861

Conversation

jbrockmendel commented Jun 8, 2021

WillAyd Jun 8, 2021

Choose a reason for hiding this comment

jbrockmendel Jun 8, 2021

Choose a reason for hiding this comment

seberg Jun 8, 2021

Choose a reason for hiding this comment

WillAyd Jun 8, 2021

Choose a reason for hiding this comment

jreback commented Jun 8, 2021