PERF: parallelize libjoin calls #51364

jbrockmendel · 2023-02-13T18:58:23Z

import numpy as np
import pandas as pd
import pandas._libs.join as libjoin

left = np.random.randint(0, 10**8, size=10**8)
right = np.random.randint(0, 10**8, size=10**8)

left.sort()
right.sort()

# The current usage
In [28]: %timeit result = libjoin.inner_join_indexer(left, right)
2.89 s ± 60 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

# An alternative that _may_ be parallelizable
lloc = len(left) // 2
rloc = right.searchsorted(left[lloc])  # 734 ns ± 16.6 ns

chunk1 = libjoin.inner_join_indexer(left[:lloc], right[:rloc])
chunk2 = libjoin.inner_join_indexer(left[lloc:], right[rloc:])
result = tuple([np.r_[chunk1[i], chunk2[i]] for i in range(3)])
# totals 3.9 s ± 209 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

The chunk1/chunk2 calls are each 1.47 s ± 40 ms and the concat is 1.05 s ± 179 ms. In a pyarrow or otherwise-chunked context the concat might not be needed.

For the non-object-dtype case, I'm pretty sure we could release the GIL in inner_join_indexer. Could we then do the chunk1/chunk2 calls in parallel? If so, could we split it further? cc @WillAyd?

WillAyd · 2023-02-13T21:45:43Z

Do we know that our current GIL-releasing usage actually improves performance? AFAIK releasing the GIL allows for the use of multiple threads, but I'm not sure that helps most of our calculations which are single-threaded and CPU-bound?

jbrockmendel · 2023-02-13T21:49:10Z

ive wondered this myself

WillAyd · 2023-02-13T21:53:45Z

WIthout opening a can of worms, if you wanted concurrency here I think you could map the two chunks to different proceses and communicate the result back to the parent process. Of course makes the code more complicated and there's a break even point where it may not be worth it, but that is my best guess on how to tackle

TomAugspurger · 2023-02-13T21:58:44Z

https://github.com/jcrist/ptime is handy for checking if stuff is being hampered by the GIL.

but I'm not sure that helps most of our calculations which are single-threaded and CPU-bound?

I'm not aware of anywhere in pandas that's currently multi-threaded, but I haven't followed things closely in a bit. GIL-releasing is crucial for libraries that use pandas in parallel, like Dask.

if you wanted concurrency here I think you could map the two chunks to different proceses and communicate the result back to the parent process.

Just to confirm: we're probably talking about threads here, not OS-level processes.

Is there a larger discussion around parallelizing things within pandas itself, rather than deferring to libraries like Dask? It is indeed a can of worms.

WillAyd · 2023-02-13T22:03:58Z

Nice thanks for the insight @TomAugspurger . Yea sorry meant to say parallelism not concurrency

jbrockmendel · 2023-02-13T22:10:19Z

Is there a larger discussion around parallelizing things within pandas itself, rather than deferring to libraries like Dask? It is indeed a can of worms.

Not really. I've been looking into what we can add to EAs to make it more appealing for dask/modin/cudf/etc to implement EAs. That's what got me looking at this chunk of code.

WillAyd · 2023-02-13T23:18:50Z

Interesting conversation though - so do we think the current GIL-releasing stuff we have in pandas definitely helps Dask downstream? Is that tested / benchmarked there?

There are definitely some cases where we have a strange mix of Tempita / fused types that I think came back to a lack of support for conditional GIL releasing for object types. If someone took it up as a passion project we might be able to revisit that and simplify a good deal

jbrockmendel · 2023-02-14T00:14:43Z

IIRC we got rid of most of the tempita. Most of what's left were cases where there were two separate dtyped-arguments that weren't necessarily the same dtype e.g. take_1d_{{in_dtype}}_{{out_dtype}}. If there's a nice way to move that to fused types I'd be all for it.

The conditional GIL i think is mostly already using fused types, will have a wave of additional cleanup once cython3 is released.

lithomas1 · 2023-02-14T03:01:36Z

xref #43313 for I/O.

We should really start thinking of a keyword to control parallelism in pandas. I believe Arrow operations are parallel by default.

For cython, I wouldn't mind if we used Cython's OpenMP capabilities which seems like just using prange vs range (would make my life a lot harder? with packaging pandas). Personally, I think it's about time we added some sort of parallel capability to pandas (not the fancy out-of-core stuff etc. that dask does though).

jbrockmendel · 2023-02-14T03:33:10Z

Personally, I think it's about time we added some sort of parallel capability to pandas (not the fancy out-of-core stuff etc. that dask does though).

+1. Aside from prange you mentioned, I don't have a clear picture of what is available.

jreback · 2023-02-14T04:02:11Z

there is an effort to standardize the parallel user facing apis across numpy, scikit-learn and pandas (this is nasa funded ; @MarcoGorelli can hopefully point to the docs that exist)

so certainly experiment - but we should tread carefully here

generally +1 in internal parallelism where we can and that is appropriate

jbrockmendel · 2023-02-15T21:58:24Z

would make my life a lot harder? with packaging pandas

@lithomas1 here's your first request for help on this topic. trying to add prange to libalgos.nancorr, added "-fopenmp" to extra_compile_args and extra_link_args in setup.py, getting clang: error: unsupported option '-fopenmp' at build time on Mac. Google suggested doing brew install llvm libomp but all i managed to do was mess up my dev environment. Suggestions?

lithomas1 · 2023-02-15T22:00:37Z

Install the conda-forge compilers package. Does that help?

I don't think conda-forge/brew play nicely with each other.

WillAyd · 2023-02-15T22:05:10Z

@jbrockmendel macOS confusingly aliases gcc to clang, so you may think you are running the former when you are actually running the latter. It looks like clang might have a different command line argument to make that work

https://stackoverflow.com/a/33400688/621736

jbrockmendel · 2023-02-19T23:38:18Z

Is there a chance that with numba this might Just Work?

MarcoGorelli · 2023-02-20T18:08:08Z

there is an effort to standardize the parallel user facing apis across numpy, scikit-learn and pandas (this is nasa funded ; @MarcoGorelli can hopefully point to the docs that exist)

yup, here you go: https://thomasjpfan.github.io/parallelism-python-libraries-design/

jbrockmendel added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Feb 13, 2023

jbrockmendel added the Performance Memory or execution speed performance label Feb 14, 2023

lithomas1 added Reshaping Concat, Merge/Join, Stack/Unstack, Explode and removed Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Feb 14, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PERF: parallelize libjoin calls #51364

PERF: parallelize libjoin calls #51364

jbrockmendel commented Feb 13, 2023 •

edited

Loading

WillAyd commented Feb 13, 2023

jbrockmendel commented Feb 13, 2023

WillAyd commented Feb 13, 2023

TomAugspurger commented Feb 13, 2023

WillAyd commented Feb 13, 2023

jbrockmendel commented Feb 13, 2023

WillAyd commented Feb 13, 2023

jbrockmendel commented Feb 14, 2023

lithomas1 commented Feb 14, 2023 •

edited

Loading

jbrockmendel commented Feb 14, 2023

jreback commented Feb 14, 2023

jbrockmendel commented Feb 15, 2023

lithomas1 commented Feb 15, 2023 •

edited

Loading

WillAyd commented Feb 15, 2023

jbrockmendel commented Feb 19, 2023

MarcoGorelli commented Feb 20, 2023 •

edited

Loading

PERF: parallelize libjoin calls #51364

PERF: parallelize libjoin calls #51364

Comments

jbrockmendel commented Feb 13, 2023 • edited Loading

WillAyd commented Feb 13, 2023

jbrockmendel commented Feb 13, 2023

WillAyd commented Feb 13, 2023

TomAugspurger commented Feb 13, 2023

WillAyd commented Feb 13, 2023

jbrockmendel commented Feb 13, 2023

WillAyd commented Feb 13, 2023

jbrockmendel commented Feb 14, 2023

lithomas1 commented Feb 14, 2023 • edited Loading

jbrockmendel commented Feb 14, 2023

jreback commented Feb 14, 2023

jbrockmendel commented Feb 15, 2023

lithomas1 commented Feb 15, 2023 • edited Loading

WillAyd commented Feb 15, 2023

jbrockmendel commented Feb 19, 2023

MarcoGorelli commented Feb 20, 2023 • edited Loading

jbrockmendel commented Feb 13, 2023 •

edited

Loading

lithomas1 commented Feb 14, 2023 •

edited

Loading

lithomas1 commented Feb 15, 2023 •

edited

Loading

MarcoGorelli commented Feb 20, 2023 •

edited

Loading