-
Notifications
You must be signed in to change notification settings - Fork 133
Add ufunc signatures to applicable functions #116
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I've looked into #109 (comment) import re
import numpy as np
from typing import *
# https://github.com/numpy/numpy/blob/main/numpy/lib/function_base.py#L2007
_DIMENSION_NAME = r'\w+'
_CORE_DIMENSION_LIST = '(?:{0:}(?:,{0:})*)?'.format(_DIMENSION_NAME)
_ARGUMENT = r'\({}\)'.format(_CORE_DIMENSION_LIST)
_ARGUMENT_LIST = '{0:}(?:,{0:})*'.format(_ARGUMENT)
_SIGNATURE = '^{0:}->{0:}$'.format(_ARGUMENT_LIST)
def parse_gufunc_signature(signature):
"""
Parse string signatures for a generalized universal function.
Arguments
---------
signature : string
Generalized universal function signature, e.g., ``(m,n),(n,p)->(m,p)``
for ``np.matmul``.
Returns
-------
Tuple of input and output core dimensions parsed from the signature, each
of the form List[Tuple[str, ...]].
"""
signature = re.sub(r'\s+', '', signature)
if not re.match(_SIGNATURE, signature):
raise ValueError(
'not a valid gufunc signature: {}'.format(signature))
return tuple(tuple(tuple(re.findall(_DIMENSION_NAME, arg))
for arg in re.findall(_ARGUMENT, arg_list))
for arg_list in signature.split('->'))
def update_dim_sizes(dim_sizes, shape, core_dims):
"""
Incrementally check and update core dimension sizes for a single argument.
Arguments
---------
dim_sizes : Dict[str, int]
Sizes of existing core dimensions. Will be updated in-place.
arg : ndarray
Argument to examine.
core_dims : Tuple[str, ...]
Core dimensions for this argument.
"""
if not core_dims:
return
num_core_dims = len(core_dims)
if len(shape) < num_core_dims:
raise ValueError(
'%d-dimensional argument does not have enough '
'dimensions for all core dimensions %r'
% (arg.ndim, core_dims))
core_shape = shape[-num_core_dims:]
for dim, size in zip(core_dims, core_shape):
if dim in dim_sizes:
if size != dim_sizes[dim]:
raise ValueError(
'inconsistent size for core dimension %r: %r vs %r'
% (dim, size, dim_sizes[dim]))
else:
dim_sizes[dim] = size
def parse_input_dimensions(args_shapes, input_core_dims):
"""
Parse broadcast and core dimensions for vectorize with a signature.
Arguments
---------
args : Tuple[Tuple[int], ...]
Tuple of input arguments to examine.
input_core_dims : List[Tuple[str, ...]]
List of core dimensions corresponding to each input.
Returns
-------
broadcast_shape : Tuple[int, ...]
Common shape to broadcast all non-core dimensions to.
dim_sizes : Dict[str, int]
Common sizes for named core dimensions.
"""
broadcast_args = []
dim_sizes = {}
for shape, core_dims in zip(args_shapes, input_core_dims):
_update_dim_sizes(dim_sizes, shape, core_dims)
ndim = len(shape) - len(core_dims)
dummy_array = np.lib.stride_tricks.as_strided(0, shape[:ndim])
broadcast_args.append(dummy_array)
broadcast_shape = np.lib.stride_tricks._broadcast_shape(*broadcast_args)
return broadcast_shape, dim_sizes
def calculate_shapes(broadcast_shape, dim_sizes, list_of_core_dims):
"""Helper for calculating broadcast shapes with core dimensions."""
return tuple(broadcast_shape + tuple(dim_sizes[dim] for dim in core_dims)
for core_dims in list_of_core_dims)
class UfuncSignature:
__slots__ = ("signature", "icore", "ocore")
def __init__(self, signature):
self.signature = re.sub(r'\s+', '', signature)
self.icore, self.ocore = parse_gufunc_signature(signature)
def __str__(self):
return self.signature
def output_for(self, *input_shapes: Tuple[int]) -> List[Tuple[int]]:
brinp, core_dims = parse_input_dimensions(input_shapes, self.icore)
return calculate_shapes(brinp, core_dims, self.ocore) |
Some signature API, it would look like this, constructing it programmatically, should not be hard as well as you need core dims and optionally their positions class Signature(MetaObject):
formula: str
"""String representation for core dims pattern.
Examples
========
Interactions
------------
* `(d),(d)->()` - dot product
* `(m,n),(n,p)->(m,p)` - matrix multiplication,
Note that `(m,n?),(n,p?)->(m?,p?)` is not supported
Reference to intermediate Axis
------------------------------
The `.k.` token skips `k` dims, `...` skips any number of dims, can only be used once in the formula
* `(M,.k.),(J,.k.)->(J,.k.)` - take_along_axis with dim=-k
* `(M,.1.),(J,.1.)->(J,.1.)` - take_along_axis with dim=-2, so 1 dim is skipped at the end
The numpy implementation of take_along_axis requires number of dims to be the same, but allows broadcasting
* `(d)->()` reduction over the -1 axis
* `(d,...)->(...)` reduction over the 0 axis
* `(.1.,d,...)->(.1.,...)` reduction over the 1 axis
* `(.2.,d,...)->(.2.,...)` reduction over the 2 axis
* `(.2.,d,...,k,.1.)->(.2.,...,.1.)` reduction over the 2 and -2 axis
Static Shapes
-------------
Sometimes you know in advance the size of input or output dimension
* `(2)->()` - the last core dim is strictly 2
* `(2,.2.)->()` - the `-3` core dim is strictly 2
Broadcasting
------------
By default Signature assumes no broadcasting,
to make signature broadcast, prepend it with `+` (shapes broadcast) or `=` (shapes are strict equal)
* `+(d)->()` - reduction over the -1 axis, now it represents Sum(-1) Operator signature
* `=(d),(d)->()` - dot product that broadcasts to arbitrary dimensions
* `+(),()->()` elemwise that broadcasts to arbitrary dimensions
* `=(),()->()` elemwise that works on arbitrary dimensions but requires all to match
* `=(),(),()->(3,)` - stack operation
Sometimes Ops may not support broadcasting to more than, e.g. 1 or 2 dimensions.
In this case signature is specified like this
* `+2(d),(d)->()` in case of regular broadcasting
* `=2(d),(d)->()` in case of strict broadcasting
"""
dtypes_formula: str
"""TBD
""" |
Closing in favor in #430 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Description
As mentioned in #109 (comment)
ufunc signatures are very useful in implementing #109
ufunc like operators should promise that can correctly broadcast to leading dimentions
https://numpy.org/doc/stable/reference/generated/numpy.ufunc.signature.html
Examples
Some examples that appear interesting.
batch dims are combined
Here
(m,n),(n,l)->(m,l)
(10, 5),(5, 7)->(10, 7)
(4),(1,3)
(*1,m,n),(*2,n,l)->(*1,m,*2,l)
Should we promise that given such inputs we broadcast the same?
The text was updated successfully, but these errors were encountered: