Skip to content

ENH: disallow passing array-like to array_namespace #42

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Dec 11, 2023

Conversation

tupui
Copy link
Contributor

@tupui tupui commented May 2, 2023

Following discussions in #39

@asmeurer
Copy link
Member

asmeurer commented May 2, 2023

Thanks. It looks like the initial reason for me doing this isn't even necessary anymore. I don't call array_namespace in wrapped functions. The only exception is in the internal _asarray function when namespace=None, which I don't think actually gets called anywhere (maybe we should just remove that and make namespace a required keyword argument).

@tupui
Copy link
Contributor Author

tupui commented May 4, 2023

The only exception is in the internal _asarray function when namespace=None, which I don't think actually gets called anywhere (maybe we should just remove that and make namespace a required keyword argument).

Oh, I am just seeing this function. _asarray looks exactly like what I have on the SciPy PR (and yes on SciPy I am now not allowing anything else than an Array API compatible array.) Shouldn't it be a public function? Then I could use this (and sklearn also has its own version.) Then on SciPy's side I can just make a small wrapper if I want to add some extra validations, flags, etc.

@tupui tupui marked this pull request as ready for review May 4, 2023 11:00
@asmeurer
Copy link
Member

asmeurer commented May 4, 2023

Oh I hadn't noticed that your function is basically the same thing. Yeah I think we can add it publicly here if that helps.

If we do that, I wonder if we should rename it. If we didn't, we would have different asarray functions that do different things depending on what namespace they are called from:

  • array_api_compat.asarray: uses namespace=namespace as the namespace, otherwise defaults to calling array_namespace and fails with non-array input if no namespace is specified
  • array_api_compat.numpy.asarray: always uses numpy, even for non-array inputs or arrays from other libraries. Doesn't have a namespace keyword.
  • array_api_compat.cupy.asarray: ditto
  • array_api_compat.torch.asarray: ditto

So maybe it would be less confusing to call it array_api_compat.asarray_namespace or something.

I'm also unsure about the namespace=None semantics of "return x if x is already an array (from any library), otherwise raise an error". I see we both implemented the same logic for that, but I'm unsure now when exactly you'd need that. It isn't actually used anywhere in array_api_compat.

@tupui
Copy link
Contributor Author

tupui commented May 5, 2023

Oh I hadn't noticed that your function is basically the same thing. Yeah I think we can add it publicly here if that helps.

Would be great! I think we have 2 ways moving forward: either we wait to see how people use this library and what sort of wrapper/helper they need, or make a round table now to discuss of our needs and propose the wrappers.

I have a preference for both as it would simplify/slim future maintenance work. But since all of that is private, we can also wait and do our cooking in the background.

If we do that, I wonder if we should rename it. If we didn't, we would have different asarray functions that do different things depending on what namespace they are called from:

I would not think this is an issue. It's quite clear if you conserve the namespace in the function call. Of course if you start to import everything with *, etc you might end up overwriting asarray and not know which one is there.

So maybe it would be less confusing to call it array_api_compat.asarray_namespace or something.

For the SciPy PR I have this but it does a combination of array_namespace and asarray. So it returns the namespace and the array.

I'm also unsure about the namespace=None semantics of "return x if x is already an array (from any library), otherwise raise an error". I see we both implemented the same logic for that, but I'm unsure now when exactly you'd need that. It isn't actually used anywhere in array_api_compat.

This can be useful if you have a simple function which just deal with the array. In this case you don't need to know the namespace (no need to call before array_namespace.) I used this in a few places in the SciPy PR.

@asmeurer
Copy link
Member

asmeurer commented May 5, 2023

Would be great! I think we have 2 ways moving forward: either we wait to see how people use this library and what sort of wrapper/helper they need, or make a round table now to discuss of our needs and propose the wrappers.

We should add anything that's generally useful to multiple libraries (as long as it keeps the compat library itself simple, i.e., pure Python code only). The goal of the compat library is to make adoption as easy as possible, so these little helper tools are definitely in scope for it.

That means that if a need comes up we can add things, but if we see something ahead of time that could be useful we can add it right away.

I would not think this is an issue. It's quite clear if you conserve the namespace in the function call. Of course if you start to import everything with *, etc you might end up overwriting asarray and not know which one is there.

Not just import * but even just from array_api_compat import asarray. It would be easy to mistake that asarray with np.asarray or xp.asarray. But their behaviors are slightly different. It's especially worrisome if we have a default for namespace=None meaning you can call asarray(x), which would work differently if it's from array_api_compat import asarray or from numpy import asarray.

We could maybe avoid this if we make namespace a required keyword argument. That's more typing, but you'd never have asarray(x) that looks like the usual asarray, but rather asarray(x, namespace=...). Alternatively, name the function something slightly different like asarray_namespace(x).

This can be useful if you have a simple function which just deal with the array. In this case you don't need to know the namespace (no need to call before array_namespace.) I used this in a few places in the SciPy PR.

OK that seems fair enough, as long as we document the behavior.

It is important, though, to make sure you do use array_namespace whenever you have more than one array input, even if you don't actually use the resulting namespace, because that properly raises an exception if you mix multiple array libraries.

@tupui
Copy link
Contributor Author

tupui commented May 8, 2023

Is there anything else to do in this PR?

@tupui
Copy link
Contributor Author

tupui commented May 11, 2023

About asarray and namespaces for these functions. Ralf made a good suggestion on my other PR for SciPy. We could name this: as_xparray. This way, it's clear and the name is quite nice I think. What do you think? I can make a PR to make this public if you give your go.

@asmeurer
Copy link
Member

What should we do with this PR? I know you ended up adding an as_xparray function in your scipy PR. Is it something that you feel should live here instead, based on our discussions, or is it too scipy-specific?

@tupui
Copy link
Contributor Author

tupui commented Jun 19, 2023

mmm we are having 2 discussions here. This PR was just about disallowing array-like to array_namespace.

For the as_xparray part, there is still a discussion about that on the SciPy PR so I am not sure we can say for sure now.

@lucascolley
Copy link
Member

lucascolley commented Dec 11, 2023

This PR was just about disallowing array-like to array_namespace.

It seems to have been decided in scipy/scipy#19118 (comment) that this is not the desired behaviour (at least for SciPy). There was confusion between the SciPy RFC which suggested retaining existing behaviour for lists and tuples etc., and this comment which supported following scikit-learn in disallowing array-likes (are scikit-learn still doing this? the BC breaks will be huge, no?).

So this PR can be closed I think. nevermind, this PR was concerned just with disallowing lists/tuples of compatible arrays, since Python scalars are currently unsupported. I think it makes sense to keep disallowing Python scalars, which forces consumer libraries to coerce array-likes of Python scalars to a default array type before using this array_namespace (if they want to support them).

So I think this PR should be merged, since passing lists or tuples of compatible arrays to functions expecting a single array is weird.


For the as_xparray part, there is still a discussion about that on the SciPy PR so I am not sure we can say for sure now.

as_xparray is currently in SciPy as the following.

def _check_finite(array, xp):
    """Check for NaNs or Infs."""
    msg = "array must not contain infs or NaNs"
    try:
        if not xp.all(xp.isfinite(array)):
            raise ValueError(msg)
    except TypeError:
        raise ValueError(msg)

def as_xparray(
    array, dtype=None, order=None, copy=None, *, xp=None, check_finite=False
):
    """SciPy-specific replacement for `np.asarray` with `order` and `check_finite`.

    Memory layout parameter `order` is not exposed in the Array API standard.
    `order` is only enforced if the input array implementation
    is NumPy based, otherwise `order` is just silently ignored.

    `check_finite` is also not a keyword in the array API standard; included
    here for convenience rather than that having to be a separate function
    call inside SciPy functions.
    """
    if xp is None:
        xp = array_namespace(array)
    if xp.__name__ in {"numpy", "scipy._lib.array_api_compat.array_api_compat.numpy"}:
        # Use NumPy API to support order
        if copy is True:
            array = np.array(array, order=order, dtype=dtype)
        else:
            array = np.asarray(array, order=order, dtype=dtype)

        # At this point array is a NumPy ndarray. We convert it to an array
        # container that is consistent with the input's namespace.
        array = xp.asarray(array)
    else:
        try:
            array = xp.asarray(array, dtype=dtype, copy=copy)
        except TypeError:
            coerced_xp = array_namespace(xp.asarray(3))
            array = coerced_xp.asarray(array, dtype=dtype, copy=copy)

    if check_finite:
        _check_finite(array, xp)

    return array

Is it something that you feel should live here instead, based on our discussions, or is it too scipy-specific?

Your call @asmeurer

@asmeurer asmeurer merged commit 874c2ff into data-apis:main Dec 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants