Skip to content

[feature request] torch.as_tensor to support any object that NumPy's asarray or array can consume (consume __array_interface__) #58036

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
vadimkantorov opened this issue May 11, 2021 · 4 comments
Labels
module: numpy Related to numpy support, and also numpy compatibility of our operators triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Comments

@vadimkantorov
Copy link
Contributor

vadimkantorov commented May 11, 2021

This would enlarge the accepted set of inputs of torch.as_tensor and would support PIL images / h5py arrays. I think this feature request goes well in the theme of standardizing support for methods like __array_interface__, __cuda_array_interface__ and such. It would be good for torch.as_tensor to support accepting __array_interface__ dicts directly. It may sometimes be conventient to store / manipulate these dictionaries directly, and then pass them to torch.as_tensor. Currently it produces Could not infer dtype of dict - which is also an unclear error message by the way.

Currently, torch.as_tensor(pil_image) fails with RuntimeError: Could not infer dtype of Image, while it can be converted with np.asarray.

As side-effect, this would also eliminate the need for torchvision's F.to_tensor(pil_image)

Related: #54138

cc @mruberry @rgommers @heitorschueroff

@vadimkantorov vadimkantorov changed the title [feature request] torch.as_tensor to support any object that NumPy's asarray or array can consume [feature request] torch.as_tensor to support any object that NumPy's asarray or array can consume (consume __array_interface__) May 11, 2021
@mruberry mruberry added module: numpy Related to numpy support, and also numpy compatibility of our operators triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module labels May 12, 2021
@mruberry
Copy link
Collaborator

Thanks for this suggestion; I wonder if the Python Array API will also make a proposal here

@rgommers
Copy link
Collaborator

gh-54187 adds __array_interface__ support.

The array API standard asarray definition (the as_tensor equivalent) says to support DLPack, and also the buffer protocol as a nice convenience. There's still some discussion about that at data-apis/array-api#155. There's a tension between having well-defined functions that have a clear purpose, and a "just swallow anything that may possibly make sense".

The use of numpy.asarray through other libraries has been pretty harmful; I'd much rather see other libraries do what PyTorch does and only accepts Tensor or tensor-like objects (those could include the buffer protocol and __array_interface__/__cuda_array_interface__ though). And not sequences, generators, etc.

@vadimkantorov
Copy link
Contributor Author

I think both are important and useful. In some parts of code it's useful to be constraining and very explicit. In other parts of code it may be useful to swallow everything without having to think, is it PIL image, or NumPy or DLPack capsule from CuPy.

If generic part is not there, users have to roll again and again their own checks of type name etc, which is more brittle and worse than tested library helper method for the same goal.

@vadimkantorov
Copy link
Contributor Author

vadimkantorov commented Jul 19, 2022

@cpuhrsch At pytorch/vision#6278 (comment), it seems impossible to have a torch.is_tensor check and polymorphic code accepting both tensor and TensorList. It would be cool to be able to solve it somehow. Could it be done somehow without causing GPU->CPU synchronization by letting support torch.as_tensor consuming both Tensor and TensorLIst (doing then a torch.stack internally)?

Or maybe at least torch.stack(x) should not do anything (at most a copy of input) if a tensor is provided as input and not a list?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module: numpy Related to numpy support, and also numpy compatibility of our operators triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

No branches or pull requests

3 participants