Blogpost for the PyTorch blog #174

lezcano · 2023-09-14T15:00:20Z

No description provided.

rgommers

Thanks Mario, great post.

blogpost/post.md

rgommers · 2023-09-14T19:29:53Z

blogpost/post.md

+considerable speed-ups.
+
+This can of course be combined with `torch.compile`  to be able to compile
+programs that rely on these other libraries.


not quiter for SciPy, since it's also encompassing compiled code without matching functionality in PyTorch.

blogpost/post.md

rgommers · 2023-09-14T20:09:17Z

blogpost/post.md

+
+From Quansight, we would like to thank Meta for funding this project and all
+the previous work that lead to it, like improving the NumPy compatibility
+within PyTorch, and developing the [python Array API](https://data-apis.org/array-api/latest/).


This didn't involve any Meta funding actually. Quansight and Intel have been the largest funders, the others are listed on https://data-apis.org/. Meta did fund the work on adding PyTorch support to https://github.com/data-apis/array-api-compat, but that work and funding acknowledgement are covered in Thomas' blog post on the scikit-learn work.

lezcano · 2023-09-15T15:02:00Z

Addressed the review. The main missing point would be that I start talking about QS in the penultimate section. I would like to mention QS somewhere at the top as well, but I really like the introductory paragraph, and I don't know how. I've added a proposal in the last commit, but I don't fully like it. Could I get some feedback particularly on that?

lezcano · 2023-09-15T15:03:01Z

In particular, could you guys give further feedback on the introductory paragraph and the last 2 sections? I'd like to make sure those are proper tight.

amjames · 2023-09-15T15:06:13Z

blogpost/post.md

+The compiled function yields a 9x speed-up when running it on 1 core. Even
+better, since the compiled code also runs on multiple cores, we get a **57x speed-up**
+when running it on 32 cores. Note that vanilla NumPy always runs on
+just one core.


Have you compared the performance to other options for compiling numpy. The get_labels example contains functions which are supported by numba, that might be an interesting datapoint to see how the torch.compile speedup compares to numba.jit

Not really, as this would just make our announcement post way too long. I will expect other people to put up posts comparing this and other approaches (julia / numba / torch.jit / mojo), but that's beyond the scope of this post IMO

I plan to write a follow-up, based on #168 or https://github.com/Quansight-Labs/numpy_pytorch_interop/tree/main/e2e/smoke, comparing to numba.jit may fit there.

That'd be super cool!

Weirdly enough, throwing @numba.njit in place of torch.compile in this example runs into several rough edges of numba:

np.linalg.norm(..., axis=2) does not compile, chokes on axis.

Replacing np.linalg.norm with

def norm(a, axis): s = (a.conj() * a).real return np.sqrt(s.sum(axis=axis))

and njit-ting both norm and get_labels gives a slowdown of about 60%.

trying @njit(parallel=True) on get_labels crashes the compiler, again on the axis argument (TypeError: Failed in nopython mode pipeline (step: Preprocessing for parfors) got an unexpected keyword argument 'axis')

trying @njit(parallel=True) on the norm only and @njit on get_labels yields a slowdown w.r.t. numpy of 'only' 20%.

Got to admit I never had much luck with numba in non-toy situations.

blogpost/post.md

ev-br · 2023-09-15T16:36:12Z

blogpost/post.md

+replacement for the PyTorch API, as it is **much slower** and, as a private API,
+**may change without notice**.
+
+## Differences between NumPy and `torch.compile`d NumPy


May want to link to #5 (or move it the list in that issue somewhere more appropriate)

Yes, that's on my TODO list for next week!

blogpost/post.md

ev-br · 2023-09-15T17:13:54Z

Reads great!
I've left several comments, all minor, can be safely ignored.

My main confusion is around the section about mixing numpy arrays and torch tensors. Might be great to also stress a recommendation for mixing the two: does torch.compile gracefully handle the mix, what is the recommendation (do not mix, mix freely or something else).

ev-br

LGTM!

blogpost/post.md

lezcano · 2023-09-22T14:12:54Z

Merging just to archive it. The last version of this post is at https://hackmd.io/@82vuWEfETza6QgoTVJYnew/SkGEpt_R2/edit

Initial commit

3a43fd9

lezcano requested review from rgommers and ev-br September 14, 2023 15:00

split lines

8a3581f

rgommers reviewed Sep 14, 2023

View reviewed changes

lezcano added 2 commits September 15, 2023 14:57

Address review

33e168a

Shoehorned QS into the first paragraph

5f6c2fc

Now I'm happier with the first paragraph

279b4dd

amjames reviewed Sep 15, 2023

View reviewed changes

Many more tweaks

1172ac0