Prefer construction via DLPack to costly element-by-element copy #120615

wjakob · 2024-02-26T15:45:55Z

Fixes #120614

Copy of pitch:

Consider the following statement

torch_array = torch.tensor(other_array)

where other_array is an array instance constructed by another array programming framework. If this other framework is principally designed around the DLPack array exchange protocol, something very bad happens.

Basically PyTorch ignores the DLPack interface (other_array.__dlpack__) altogether. Furthermore, if other_array exposes the sequence protocol (__len__ and __getitem__), PyTorch will perform a brute-force element-wise copy potentially requiring hundreds of millions of separate PCI-express transactions to copy individual floating point values from the GPU. To the user, it will seem that the application has crashed because nothing happens and attempting to interrupt the Python kernel doesn't work since this is all done on the C++ side.

Fortunately PyTorch supports the DLPack protocol to efficiently exchange tensors with other libraries. But it only does so when the user creates the arrays in a sort of awkward way:

from torch.utils.dlpack import from_dlpack
torch_array = from_dlpack(other_array)

It's easy to forget to do this, with extremely unpleasant results. It would be very easy for PyTorch to check if other_array implements the DLPack protocol and then simply to switch to this construction method. I will create a separate PR proposing a prototype of this idea.

pytorch-bot · 2024-02-26T15:45:59Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/120615

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (2 Unrelated Failures)

As of commit 3b48f6a with merge base b381a43 ():

FLAKY - The following job failed but was likely due to flakiness present on trunk:

pull / linux-focal-py3.8-clang10 / test (dynamo, 3, 3, linux.2xlarge) (gh)
test_sparse.py::TestSparseMeta::test_basic_SparseCOO_float64

BROKEN TRUNK - The following job failed but was present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

pull / linux-focal-py3.11-clang10 / test (dynamo, 3, 3, linux.2xlarge) (gh)
test_sparse.py::TestSparseMeta::test_basic_SparseCOO_float64

This comment was automatically generated by Dr. CI and updates every 15 minutes.

linux-foundation-easycla · 2024-02-26T15:46:02Z

✅login: wjakob / (3b48f6a)

The committers listed above are authorized under a signed CLA.

vadimkantorov · 2024-02-26T21:11:39Z

Does torch.as_tensor also support __dlpack__ or __array_interface__? related: #58036

wjakob · 2024-02-26T21:18:57Z

@vadimkantorov __array_interface__ is a CPU-only API, and I would like to keep this PR focused on the DLPack functionality.

torch.asarray has the same flaw and also does not check for DLPack capabilities of the input object. If the approach I took seems fine to you, I can add similar logic there.

vadimkantorov · 2024-02-26T21:29:09Z

I'm not one of maintainers/reviewers, so we'll have to see what they got to say :) I just also stumbled on this similar issue of torch.as_tensor not supporting at the time __array_interface__/__cuda_array_interface__/__dlpack__/buffer protocol/memoryview inputs...

wjakob · 2024-02-26T21:32:18Z

Intererestingly, there is some support for __cuda_array_interface__, but this is a legacy API with some rather significant flaws.

ezyang · 2024-02-27T04:34:54Z

For reference, who is implementing __dlpack__ in this case?

ezyang · 2024-02-27T04:35:43Z

@lezcano @rgommers do either of you have an opinion on the relative priority of this wrt __cuda_array_interface__?

This PR in principle seems fine, I think we may need to haggle about where exactly it goes in the priority chain.

Haven't reviewed the code but it needs a test.

rgommers · 2024-02-27T06:36:55Z

This would be great to fix, thanks @wjakob.

torch.asarray has the same flaw and also does not check for DLPack capabilities of the input object. If the approach I took seems fine to you, I can add similar logic there.

+1 to adding this to torch.asarray as well.

This PR in principle seems fine, I think we may need to haggle about where exactly it goes in the priority chain.

In what is probably the most common case (no explicit stream argument, but only a single tensor over the default CUDA stream with regular ints/floats), __dlpack__ and __cuda_array_interface__ (CAI) should behave identically AFAIK. For more complex cases, DLPack learned some of the lessons from CAI and improved over its stream handling. I am unsure if that was fixed in a later CAI update though, or if this is what @wjakob is referring to with "significant flaws". From a browse of the docs (now at v3) there is at least no ROCm stream handling, which DLPack does have.

Another important difference is dtype handling. The dtypes supported by CAI are numpy type strings, while DLPack uses these enums. Where there are differences, DLPack dtypes are more useful to PyTorch I'd think. The extra numpy ones (strings, datetimes, object, void) are not interesting, while DLPack supports bfloat16 and lower-precision int/float dtypes.

DLPack is also more general w.r.t. other devices, and being actively evolved. E.g., it just gained a read-only flag, which may become relevant if PyTorch gains read-only/copy-on-write tensors for example. So I'd have a preference for trying DLPack before CAI.

The counter-argument I can think of may be stability. DLPack just tagged 1.0rc1. 1.0 is ABI-breaking; there's a way to query the version and handle both 0.x and 1.0 appropriately, but that should then of course be supported in PyTorch within 12-18 months or so after it comes out, otherwise there may be a situation in the future where other libraries drop 0.x support while PyTorch doesn't yet support 1.x.

wjakob · 2024-02-27T08:38:05Z

@ezyang, @rgommers: Regarding the flaws of __cuda_array_interface__: besides the issue with streams and non-CUDA devices mentioned by @rgommers, let me quote the elephant in the room directly from the CuPy reference:

Warning: __cuda_array_interface__ specifies that the object lifetime must be managed by the user, so it is an undefined behavior if the exported object is destroyed while still in use by the consumer library.

(Yikes.) DLPack basically clarifies the ownership story and makes that part accessible to non-python consumers/producers that don't have a way of Py_DECREF-ing an object. It may even be worth prioritizing __dlpack__ over __cuda_array_interface__ if a tensor object supports both.

rgommers · 2024-02-27T09:51:34Z

Ah yes, lifetime management is a worse issue than the ones I mentioned. Then I'd definitely put DLPack first.

vadimkantorov · 2024-02-27T10:09:02Z

Also, DLPack support in PyTorch maybe still has this bug: #43166

The variant of supporting full-user-in-charge lifetime management (or passing an extra user-provided deleter) is also important - for basic interop scenarios like #34646

Also, if conflicts between several interfaces might arise, is it possible to somehow select the wanted protocol? e.g. by torch.as_tensor(myobj.__dlpack__) or torch.as_tensor(myobj.__array_interface__)?

wjakob · 2024-02-27T10:44:07Z

Also, DLPack support in PyTorch maybe still has this bug: #43166

@vadimkantorov Actually, I think I had reported the same issue separately (was unaware of this ticket) as #117273, which has been fixed by in the meantime.

ezyang · 2024-02-28T14:51:50Z

Needs test

ezyang

just waiting for test

github-actions · 2024-04-28T15:34:36Z

Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as Stale.
Feel free to remove the Stale label if you feel this was a mistake.
If you are unable to remove the Stale label please contact a maintainer in order to do so.
If you want the bot to never mark this PR stale again, add the no-stale label.
Stale pull requests will automatically be closed after 30 days of inactivity.

ysiraichi · 2024-10-21T15:27:46Z

@wjakob Are you still working on this? If not, is it ok if I take over this PR?

wjakob · 2024-10-21T23:35:47Z

@ysiraichi please go ahead, I ran out of time to finish this.

#138697) Fixes #120614 Takes over #120615 In summary, this PR: - Adds a `__dlpack__` attribute check in the tensor creation path (i.e. [`internal_new_from_data` @ tensor_new.cpp](https://github.com/pytorch/pytorch/blob/cdfe1bffd16bdd28adbe5518038f68e6ac45de8d/torch/csrc/utils/tensor_new.cpp#L266)) - Creates the tensor by using the DLPack machinery, instead of an element-by-element copy - No changes since #120615 - Adds a test, making sure the DLPack machinery is used - Wraps a tensor in a fresh `TensorDLPackWrapper` class that implements only the DLPack methods - Creates a new tensor from an instance of `TensorDLPackWrapper` Pull Request resolved: #138697 Approved by: https://github.com/ezyang Co-authored-by: Wenzel Jakob <[email protected]>

prefer construction via dlpack to costly element-by-element copy

3b48f6a

pytorchbot added the open source label Feb 26, 2024

Skylion007 requested review from ezyang, ysiraichi and malfet February 26, 2024 23:26

ezyang approved these changes Feb 28, 2024

View reviewed changes

github-actions bot added the Stale label Apr 28, 2024

github-actions bot closed this May 28, 2024

ysiraichi mentioned this pull request Oct 23, 2024

Use DLPack for creating tensors out of custom classes, when available. #138697

Closed

Prefer construction via DLPack to costly element-by-element copy #120615

Prefer construction via DLPack to costly element-by-element copy #120615

Uh oh!

Conversation

wjakob commented Feb 26, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Copy of pitch:

Uh oh!

pytorch-bot bot commented Feb 26, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/120615

✅ You can merge normally! (2 Unrelated Failures)

Uh oh!

linux-foundation-easycla bot commented Feb 26, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vadimkantorov commented Feb 26, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wjakob commented Feb 26, 2024

Uh oh!

vadimkantorov commented Feb 26, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wjakob commented Feb 26, 2024

Uh oh!

ezyang commented Feb 27, 2024

Uh oh!

ezyang commented Feb 27, 2024

Uh oh!

rgommers commented Feb 27, 2024

Uh oh!

wjakob commented Feb 27, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rgommers commented Feb 27, 2024

Uh oh!

vadimkantorov commented Feb 27, 2024

Uh oh!

wjakob commented Feb 27, 2024

Uh oh!

ezyang commented Feb 28, 2024

Uh oh!

ezyang left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Apr 28, 2024

Uh oh!

ysiraichi commented Oct 21, 2024

Uh oh!

wjakob commented Oct 21, 2024

Uh oh!

Uh oh!

wjakob commented Feb 26, 2024 •

edited

Loading

pytorch-bot bot commented Feb 26, 2024 •

edited

Loading

linux-foundation-easycla bot commented Feb 26, 2024 •

edited

Loading

vadimkantorov commented Feb 26, 2024 •

edited

Loading

vadimkantorov commented Feb 26, 2024 •

edited

Loading

wjakob commented Feb 27, 2024 •

edited

Loading