Skip to content

feat: add support for specifying a data type "kind" in astype #848

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 11 commits into
base: main
Choose a base branch
from
29 changes: 24 additions & 5 deletions src/array_api_stubs/_draft/data_type_functions.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,10 +13,15 @@


def astype(
x: array, dtype: dtype, /, *, copy: bool = True, device: Optional[device] = None
x: array,
dtype_or_kind: Union[dtype, str],
/,
*,
copy: bool = True,
device: Optional[device] = None,
) -> array:
"""
Copies an array to a specified data type irrespective of :ref:`type-promotion` rules.
Copies an array to a specified data type irrespective of :ref:`type-promotion` rules, or to a *kind* of data type.

.. note::
Casting floating-point ``NaN`` and ``infinity`` values to integral data types is not specified and is implementation-dependent.
Expand All @@ -40,8 +45,14 @@ def astype(
----------
x: array
array to cast.
dtype: dtype
desired data type.
dtype_or_kind: Union[dtype, str]
desired data type or kind of data type. Supported kinds are:
- ``'bool'``: boolean data types (e.g., ``bool``).
- ``'signed integer'``: signed integer data types (e.g., ``int8``, ``int16``, ``int32``, ``int64``).
- ``'unsigned integer'``: unsigned integer data types (e.g., ``uint8``, ``uint16``, ``uint32``, ``uint64``).
- ``'integral'``: integer data types. Shorthand for ``('signed integer', 'unsigned integer')``.
- ``'real floating'``: real-valued floating-point data types (e.g., ``float32``, ``float64``).
- ``'complex floating'``: complex floating-point data types (e.g., ``complex64``, ``complex128``).
copy: bool
specifies whether to copy an array when the specified ``dtype`` matches the data type of the input array ``x``. If ``True``, a newly allocated array must always be returned. If ``False`` and the specified ``dtype`` matches the data type of the input array, the input array must be returned; otherwise, a newly allocated array must be returned. Default: ``True``.
device: Optional[device]
Expand All @@ -50,7 +61,15 @@ def astype(
Returns
-------
out: array
an array having the specified data type. The returned array must have the same shape as ``x``.
For ``dtype_or_kind`` a data type, an array having the specified data type.
For ``dtype_or_kind`` a kind of data type:
- If ``x.dtype`` is already of that kind, the data type is maintained.
- Otherwise, an attempt is made to convert to the specified kind, according to the type promotion rules (see :ref:`type-promotion`).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why "an attempt"? That seems ambiguous. We have to be clear about what must work. Which I think is:

  • float to complex
  • unsigned to signed integer

Anything else doesn't I think? There's no point allowing 'bool' I think, since there is only one boolean dtype so dtype=xp.bool will be cleaner.

For 'signed integer' and 'real floating-point'` there are also no promotion rules to follow, so they can be left out - or do you see a use case?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've reduced this down to just 'complex floating' (use-case: mixed float/complex to complex) and 'signed integer' (use-case: mixed signed/unsigned to signed).

I think "an attempt" would still be accurate for an implementation of this? xp.astype(some_int8_array, 'complex floating') would attempt a conversion, whose success will depend on the implementation-specific type promotion rules, right?

Unless you think that this function should always error when the type promotion is not defined by the standard?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think "an attempt" would still be accurate for an implementation of this?

I think you have the right idea in mind here, it's just a "language we use to specify things" thing. We specify which behavior has to be supported - 'complex floating' has type promotion rules defined in the standard, so it's expected to always work for a compliant implementation. Then, if we expect other input types to raise, then we specify that by "must raise ..." or "input type must be ...". In this case there's no reason to do that (implementors are free to suppport more types, it's just not standardized), so we then say "input type should be ...".

Your "attempt to ..." seems to be the same as "should be ...", it's just language we want to write in a uniform way.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about the wording now?

Copy link
Contributor

@kgryte kgryte Jan 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quick update: I took the liberty to update the wording. I also made the call to broaden the list of data type kinds. I think there are reasonable arguments for providing any one of the numeric data type kinds (e.g., int32 and real floating => float64, etc), and it is possible to delineate a set of clearly defined rules in terms of which data type should be returned. Leaving bool out seems somewhat arbitrary, especially when the semantics are clearly specified and all other kinds can be, IMO, reasonably provided (note: even including "numeric"; i.e., convert anything provided to me to numbers so I can compute the sum, etc).


- Numeric kinds are interpreted as the lowest-precision standard data type of that kind for the purposes of type promotion. For example, ``astype(x, 'complex floating')`` will return an array with the data type ``complex64`` when ``x.dtype`` is ``float32``, since ``complex64`` is the result of promoting ``float32`` with the lowest-precision standard complex data type, ``complex64``.
- For kind ``integral``, the 'lowest-precision standard data type' is interpreted as ``int8``, not ``uint8``.

The returned array must have the same shape as ``x``.

Notes
-----
Expand Down
Loading