Skip to content

PERF: parallelize libjoin calls #51364

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
jbrockmendel opened this issue Feb 13, 2023 · 16 comments
Open

PERF: parallelize libjoin calls #51364

jbrockmendel opened this issue Feb 13, 2023 · 16 comments
Labels
Performance Memory or execution speed performance Reshaping Concat, Merge/Join, Stack/Unstack, Explode

Comments

@jbrockmendel
Copy link
Member

jbrockmendel commented Feb 13, 2023

import numpy as np
import pandas as pd
import pandas._libs.join as libjoin

left = np.random.randint(0, 10**8, size=10**8)
right = np.random.randint(0, 10**8, size=10**8)

left.sort()
right.sort()

# The current usage
In [28]: %timeit result = libjoin.inner_join_indexer(left, right)
2.89 s ± 60 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

# An alternative that _may_ be parallelizable
lloc = len(left) // 2
rloc = right.searchsorted(left[lloc])  # 734 ns ± 16.6 ns

chunk1 = libjoin.inner_join_indexer(left[:lloc], right[:rloc])
chunk2 = libjoin.inner_join_indexer(left[lloc:], right[rloc:])
result = tuple([np.r_[chunk1[i], chunk2[i]] for i in range(3)])
# totals 3.9 s ± 209 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

The chunk1/chunk2 calls are each 1.47 s ± 40 ms and the concat is 1.05 s ± 179 ms. In a pyarrow or otherwise-chunked context the concat might not be needed.

For the non-object-dtype case, I'm pretty sure we could release the GIL in inner_join_indexer. Could we then do the chunk1/chunk2 calls in parallel? If so, could we split it further? cc @WillAyd?

@jbrockmendel jbrockmendel added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Feb 13, 2023
@WillAyd
Copy link
Member

WillAyd commented Feb 13, 2023

Do we know that our current GIL-releasing usage actually improves performance? AFAIK releasing the GIL allows for the use of multiple threads, but I'm not sure that helps most of our calculations which are single-threaded and CPU-bound?

@jbrockmendel
Copy link
Member Author

ive wondered this myself

@WillAyd
Copy link
Member

WillAyd commented Feb 13, 2023

WIthout opening a can of worms, if you wanted concurrency here I think you could map the two chunks to different proceses and communicate the result back to the parent process. Of course makes the code more complicated and there's a break even point where it may not be worth it, but that is my best guess on how to tackle

@TomAugspurger
Copy link
Contributor

https://github.com/jcrist/ptime is handy for checking if stuff is being hampered by the GIL.

but I'm not sure that helps most of our calculations which are single-threaded and CPU-bound?

I'm not aware of anywhere in pandas that's currently multi-threaded, but I haven't followed things closely in a bit. GIL-releasing is crucial for libraries that use pandas in parallel, like Dask.

if you wanted concurrency here I think you could map the two chunks to different proceses and communicate the result back to the parent process.

Just to confirm: we're probably talking about threads here, not OS-level processes.

Is there a larger discussion around parallelizing things within pandas itself, rather than deferring to libraries like Dask? It is indeed a can of worms.

@WillAyd
Copy link
Member

WillAyd commented Feb 13, 2023

Nice thanks for the insight @TomAugspurger . Yea sorry meant to say parallelism not concurrency

@jbrockmendel
Copy link
Member Author

Is there a larger discussion around parallelizing things within pandas itself, rather than deferring to libraries like Dask? It is indeed a can of worms.

Not really. I've been looking into what we can add to EAs to make it more appealing for dask/modin/cudf/etc to implement EAs. That's what got me looking at this chunk of code.

@WillAyd
Copy link
Member

WillAyd commented Feb 13, 2023

Interesting conversation though - so do we think the current GIL-releasing stuff we have in pandas definitely helps Dask downstream? Is that tested / benchmarked there?

There are definitely some cases where we have a strange mix of Tempita / fused types that I think came back to a lack of support for conditional GIL releasing for object types. If someone took it up as a passion project we might be able to revisit that and simplify a good deal

@jbrockmendel
Copy link
Member Author

IIRC we got rid of most of the tempita. Most of what's left were cases where there were two separate dtyped-arguments that weren't necessarily the same dtype e.g. take_1d_{{in_dtype}}_{{out_dtype}}. If there's a nice way to move that to fused types I'd be all for it.

The conditional GIL i think is mostly already using fused types, will have a wave of additional cleanup once cython3 is released.

@jbrockmendel jbrockmendel added the Performance Memory or execution speed performance label Feb 14, 2023
@lithomas1
Copy link
Member

lithomas1 commented Feb 14, 2023

xref #43313 for I/O.

We should really start thinking of a keyword to control parallelism in pandas. I believe Arrow operations are parallel by default.

For cython, I wouldn't mind if we used Cython's OpenMP capabilities which seems like just using prange vs range (would make my life a lot harder? with packaging pandas). Personally, I think it's about time we added some sort of parallel capability to pandas (not the fancy out-of-core stuff etc. that dask does though).

@jbrockmendel
Copy link
Member Author

Personally, I think it's about time we added some sort of parallel capability to pandas (not the fancy out-of-core stuff etc. that dask does though).

+1. Aside from prange you mentioned, I don't have a clear picture of what is available.

@jreback
Copy link
Contributor

jreback commented Feb 14, 2023

there is an effort to standardize the parallel user facing apis across numpy, scikit-learn and pandas (this is nasa funded ; @MarcoGorelli can hopefully point to the docs that exist)

so certainly experiment - but we should tread carefully here

generally +1 in internal parallelism where we can and that is appropriate

@lithomas1 lithomas1 added Reshaping Concat, Merge/Join, Stack/Unstack, Explode and removed Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Feb 14, 2023
@jbrockmendel
Copy link
Member Author

would make my life a lot harder? with packaging pandas

@lithomas1 here's your first request for help on this topic. trying to add prange to libalgos.nancorr, added "-fopenmp" to extra_compile_args and extra_link_args in setup.py, getting clang: error: unsupported option '-fopenmp' at build time on Mac. Google suggested doing brew install llvm libomp but all i managed to do was mess up my dev environment. Suggestions?

@lithomas1
Copy link
Member

lithomas1 commented Feb 15, 2023

Install the conda-forge compilers package. Does that help?

I don't think conda-forge/brew play nicely with each other.

@WillAyd
Copy link
Member

WillAyd commented Feb 15, 2023

@jbrockmendel macOS confusingly aliases gcc to clang, so you may think you are running the former when you are actually running the latter. It looks like clang might have a different command line argument to make that work

https://stackoverflow.com/a/33400688/621736

@jbrockmendel
Copy link
Member Author

Is there a chance that with numba this might Just Work?

@MarcoGorelli
Copy link
Member

MarcoGorelli commented Feb 20, 2023

there is an effort to standardize the parallel user facing apis across numpy, scikit-learn and pandas (this is nasa funded ; @MarcoGorelli can hopefully point to the docs that exist)

yup, here you go: https://thomasjpfan.github.io/parallelism-python-libraries-design/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Performance Memory or execution speed performance Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

No branches or pull requests

6 participants