Add ufunc signatures to applicable functions #116

ferrine · 2022-12-13T09:45:57Z

Description

As mentioned in #109 (comment)
ufunc signatures are very useful in implementing #109

ufunc like operators should promise that can correctly broadcast to leading dimentions
https://numpy.org/doc/stable/reference/generated/numpy.ufunc.signature.html

Examples

Some examples that appear interesting.

batch dims are combined

a = np.random.randn(4, 10,5)
b = np.random.randn(1, 3, 5, 7)
np.dot(a, b).shape
(4, 10, 1, 3, 7)

Here

signature is (m,n),(n,l)->(m,l)
core dims are (10, 5),(5, 7)->(10, 7)
batch dims that broadcast (4),(1,3)
broadcast rule is (*1,m,n),(*2,n,l)->(*1,m,*2,l)
Is there a formal reference how to calculate broadcasting?

Should we promise that given such inputs we broadcast the same?

The text was updated successfully, but these errors were encountered:

ferrine · 2022-12-13T11:13:05Z

I've looked into #109 (comment)
and got to this

import re
import numpy as np
from typing import *
# https://github.com/numpy/numpy/blob/main/numpy/lib/function_base.py#L2007
_DIMENSION_NAME = r'\w+'
_CORE_DIMENSION_LIST = '(?:{0:}(?:,{0:})*)?'.format(_DIMENSION_NAME)
_ARGUMENT = r'\({}\)'.format(_CORE_DIMENSION_LIST)
_ARGUMENT_LIST = '{0:}(?:,{0:})*'.format(_ARGUMENT)
_SIGNATURE = '^{0:}->{0:}$'.format(_ARGUMENT_LIST)


def parse_gufunc_signature(signature):
    """
    Parse string signatures for a generalized universal function.
    Arguments
    ---------
    signature : string
        Generalized universal function signature, e.g., ``(m,n),(n,p)->(m,p)``
        for ``np.matmul``.
    Returns
    -------
    Tuple of input and output core dimensions parsed from the signature, each
    of the form List[Tuple[str, ...]].
    """
    signature = re.sub(r'\s+', '', signature)

    if not re.match(_SIGNATURE, signature):
        raise ValueError(
            'not a valid gufunc signature: {}'.format(signature))
    return tuple(tuple(tuple(re.findall(_DIMENSION_NAME, arg))
                  for arg in re.findall(_ARGUMENT, arg_list))
                 for arg_list in signature.split('->'))


def update_dim_sizes(dim_sizes, shape, core_dims):
    """
    Incrementally check and update core dimension sizes for a single argument.
    Arguments
    ---------
    dim_sizes : Dict[str, int]
        Sizes of existing core dimensions. Will be updated in-place.
    arg : ndarray
        Argument to examine.
    core_dims : Tuple[str, ...]
        Core dimensions for this argument.
    """
    if not core_dims:
        return

    num_core_dims = len(core_dims)
    if len(shape) < num_core_dims:
        raise ValueError(
            '%d-dimensional argument does not have enough '
            'dimensions for all core dimensions %r'
            % (arg.ndim, core_dims))

    core_shape = shape[-num_core_dims:]
    for dim, size in zip(core_dims, core_shape):
        if dim in dim_sizes:
            if size != dim_sizes[dim]:
                raise ValueError(
                    'inconsistent size for core dimension %r: %r vs %r'
                    % (dim, size, dim_sizes[dim]))
        else:
            dim_sizes[dim] = size


def parse_input_dimensions(args_shapes, input_core_dims):
    """
    Parse broadcast and core dimensions for vectorize with a signature.
    Arguments
    ---------
    args : Tuple[Tuple[int], ...]
        Tuple of input arguments to examine.
    input_core_dims : List[Tuple[str, ...]]
        List of core dimensions corresponding to each input.
    Returns
    -------
    broadcast_shape : Tuple[int, ...]
        Common shape to broadcast all non-core dimensions to.
    dim_sizes : Dict[str, int]
        Common sizes for named core dimensions.
    """
    broadcast_args = []
    dim_sizes = {}
    for shape, core_dims in zip(args_shapes, input_core_dims):
        _update_dim_sizes(dim_sizes, shape, core_dims)
        ndim = len(shape) - len(core_dims)
        dummy_array = np.lib.stride_tricks.as_strided(0, shape[:ndim])
        broadcast_args.append(dummy_array)
    broadcast_shape = np.lib.stride_tricks._broadcast_shape(*broadcast_args)
    return broadcast_shape, dim_sizes


def calculate_shapes(broadcast_shape, dim_sizes, list_of_core_dims):
    """Helper for calculating broadcast shapes with core dimensions."""
    return tuple(broadcast_shape + tuple(dim_sizes[dim] for dim in core_dims)
            for core_dims in list_of_core_dims)

class UfuncSignature:
    __slots__ = ("signature", "icore", "ocore")
    def __init__(self, signature):
        self.signature = re.sub(r'\s+', '', signature)
        self.icore, self.ocore = parse_gufunc_signature(signature)

    def __str__(self):
        return self.signature
    
    def output_for(self, *input_shapes: Tuple[int]) -> List[Tuple[int]]:
        brinp, core_dims = parse_input_dimensions(input_shapes, self.icore)
        return calculate_shapes(brinp, core_dims, self.ocore)

ferrine · 2022-12-16T15:19:31Z

Some signature API, it would look like this, constructing it programmatically, should not be hard as well as you need core dims and optionally their positions

class Signature(MetaObject):
    formula: str
    """String representation for core dims pattern.
    
    Examples
    ========
    
    Interactions
    ------------
    * `(d),(d)->()` - dot product
    * `(m,n),(n,p)->(m,p)` - matrix multiplication, 
        Note that `(m,n?),(n,p?)->(m?,p?)` is not supported
    
    Reference to intermediate Axis
    ------------------------------
    The `.k.` token skips `k` dims, `...` skips any number of dims, can only be used once in the formula

    * `(M,.k.),(J,.k.)->(J,.k.)` - take_along_axis with dim=-k
    * `(M,.1.),(J,.1.)->(J,.1.)` - take_along_axis with dim=-2, so 1 dim is skipped at the end
        The numpy implementation of take_along_axis requires number of dims to be the same, but allows broadcasting
    * `(d)->()` reduction over the -1 axis
    * `(d,...)->(...)` reduction over the 0 axis
    * `(.1.,d,...)->(.1.,...)` reduction over the 1 axis
    * `(.2.,d,...)->(.2.,...)` reduction over the 2 axis
    * `(.2.,d,...,k,.1.)->(.2.,...,.1.)` reduction over the 2 and -2 axis
    
    Static Shapes
    -------------
    Sometimes you know in advance the size of input or output dimension
    
    * `(2)->()` - the last core dim is strictly 2
    * `(2,.2.)->()` - the `-3` core dim is strictly 2

    Broadcasting
    ------------
    By default Signature assumes no broadcasting, 
    to make signature broadcast, prepend it with `+` (shapes broadcast) or `=` (shapes are strict equal)

    * `+(d)->()` - reduction over the -1 axis, now it represents Sum(-1) Operator signature
    * `=(d),(d)->()` - dot product that broadcasts to arbitrary dimensions
    * `+(),()->()` elemwise that broadcasts to arbitrary dimensions
    * `=(),()->()` elemwise that works on arbitrary dimensions but requires all to match
    * `=(),(),()->(3,)` - stack operation
    
    Sometimes Ops may not support broadcasting to more than, e.g. 1 or 2 dimensions. 
    In this case signature is specified like this
    
    * `+2(d),(d)->()` in case of regular broadcasting
    * `=2(d),(d)->()` in case of strict broadcasting
    """
    dtypes_formula: str
    """TBD
    """

ricardoV94 · 2024-07-10T09:25:29Z

Closing in favor in #430

ferrine added request discussion NumPy compatibility Op implementation refactor labels Dec 13, 2022

ricardoV94 closed this as completed Jul 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add ufunc signatures to applicable functions #116

Add ufunc signatures to applicable functions #116

ferrine commented Dec 13, 2022 •

edited

Loading

ferrine commented Dec 13, 2022 •

edited

Loading

Uh oh!

ferrine commented Dec 16, 2022 •

edited

Loading

Uh oh!

ricardoV94 commented Jul 10, 2024

Uh oh!

Add ufunc signatures to applicable functions #116

Add ufunc signatures to applicable functions #116

Comments

ferrine commented Dec 13, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Examples

batch dims are combined

ferrine commented Dec 13, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ferrine commented Dec 16, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ricardoV94 commented Jul 10, 2024

Uh oh!

ferrine commented Dec 13, 2022 •

edited

Loading

ferrine commented Dec 13, 2022 •

edited

Loading

ferrine commented Dec 16, 2022 •

edited

Loading