Store function metadata in a machine readable format #49

asmeurer · 2020-10-07T23:21:08Z

It would be useful for the test suite to have the function metadata stored in a machine readable format. Currently I am parsing the function signatures from the spec files using some regular expressions, and I will probably end up parsing some other information such as types as well. This works fine for now, but it would be cleaner if this data were stored in a machine readable format, say in JSON, and the relevant parts of the spec documents generated from that automatically.

To be sure, not everything in the spec needs to be in JSON, just the parts that will need to be extracted for other things as well, such as the test suite. There should still be a lot of plain English descriptions of behavior.

This is likely too much work for version 1 given that we already have things inline in the Markdown, but it's something to consider for future iterations.

saulshanabrook · 2020-10-08T16:59:43Z

That makes sense to me. I wanted to highlight one of the existing JSON formats I am using for python-record-api.

Minimal example generated from this file: https://github.com/data-apis/python-record-api/blob/master/data/api/sample-usage.json

It is specified/documented as pydantic models which are useful to easily serialize/deserialize from python into JSON: https://github.com/data-apis/python-record-api/blob/006faf0bba9cd4cb55fbacc13d2bbda365f5bf0b/record_api/apis.py#L69

For the "leaf nodes" of actual types I also built some pydantic models for different kinds of types: https://github.com/data-apis/python-record-api/blob/006faf0bba9cd4cb55fbacc13d2bbda365f5bf0b/record_api/type_analysis.py#L74. Normal python instances can just be saved with the type names and it has special handling for different generic types (like lists, tuples, etc) or literal types (strings).

asmeurer · 2020-10-08T19:57:22Z

I don't want to get bogged down in a metaconversation on the "right" way to specify types for array functions. Any specification is fine, as long as it is machine readable. We could consider the JSON as an internal document and not part of the actual spec (i.e., the schema could change between minor spec versions). Some sorts of things that I could imagine wanting to parse here for the tests are:

The function name and signature. I'm happy for this to just be something like add(x1, x2, /), though if we want to split out the parameters that's fine too.
The top-level type of each argument (array, floating point scalar, boolean, etc.)
For those that are arrays,
- valid dtypes
- valid shapes
- broadcastibility requirements with other input parameters
- valid domain of inputs (if not all input values are allowed, e.g., sqrt behavior in the spec is only defined for nonnegative inputs)
Same for the return type
Example inputs and outputs

If you already have some thoughts on the right way to specify these sorts of things, that's great, and we should use it. But I don't want to wait on a meta decision on how to specify types. My main motivation here is to make it so I can generate as much of the test suite automatically from the spec as possible, so that it's easier to keep them in sync.

asmeurer · 2020-10-08T19:59:19Z

It's also fine if we can't represent some corner cases, at least to begin with. For example, we might not be able to represent valid shapes for something like matmul (it isn't in the spec yet but I think it might be added), but it's fine if I have to hard-code that as long as the shape information works for the majority of other functions.

…lementwise functions This will make these easier to parse from the test suite, barring something like #49.

…lementwise functions (#52) * Use a "special values" header before the list of special values for elementwise functions This will make these easier to parse from the test suite, barring something like #49. * Fix some inconsistencies in the wording of the special values listings * Fix more wording inconsistencies in the special value listings

leofang · 2021-09-08T14:04:00Z

I am revisiting this issue as I encounter a similar need. Parallel to the need for updating docstrings (#180), we also need this metadata to populate, say TOC of a doc page. Currently in CuPy I am using .. automodule:: to let Sphinx parse all functions under the array_api namespace. This works, but it's not ideal, as I can't control the order of appearance of the functions (Sphinx sorts them alphabetically). If the metadata is provided, I may be able to group them on demand based on the nature of the API (creation, statistics, linalg, etc).

asmeurer · 2021-09-08T19:18:57Z

I should mention that in the test suite I am parsing parts of the spec and populating some function stubs https://github.com/data-apis/array-api-tests/tree/master/array_api_tests/function_stubs. Feel free to reuse these for your implementation, or use it to extract a manual list of functions. The dictionaries at the top of test_type_promotion.py may also be useful if you plan to restrict input dtypes like the NumPy implementation does (although it should be clear implementations do not need to be minimal like this. We did so for the NumPy one because it is a reference implementation, but dtype restrictions are not required by the spec).

asmeurer · 2022-11-29T21:15:29Z

Things that it would be useful to have structured data for:

Input and output range (see for example asin)
Special cases
Input dtypes
Output dtype (e.g., "promoted" or "boolean")
Input/output shapes (especially for linear algebra functions)
Output data for functions that returned named tuples
For each of the above, whether it is required or only suggested

We already have effectively structured data for the siguratures and type annotations.

Like I said, there should also be room for plain-text notes, as there will always be things that don't fit into the existing schemes, and we also want the ability to add things like motiations and implementation notes.

asmeurer added a commit that referenced this issue Oct 15, 2020

Use a "special values" header before the list of special values for e…

ec0cbbc

…lementwise functions This will make these easier to parse from the test suite, barring something like #49.

asmeurer mentioned this issue Oct 15, 2020

Use a "special values" header before the list of special values for elementwise functions #52

Merged

asmeurer mentioned this issue May 12, 2021

A mechanism for propagating docstring updates? #180

Closed

kgryte added the Tools Issue or pull request pertaining to tooling for authoring and managing this specification. label Oct 4, 2021

kgryte added this to the v2022 milestone Oct 4, 2021

kgryte mentioned this issue Dec 9, 2021

RFC: 2022 Standardization Priorities #343

Closed

asmeurer mentioned this issue Nov 29, 2022

Migrate documentation of special cases to use math directives #519

Open

rgommers removed this from the v2022 milestone Dec 14, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Store function metadata in a machine readable format #49

Store function metadata in a machine readable format #49

asmeurer commented Oct 7, 2020

saulshanabrook commented Oct 8, 2020

asmeurer commented Oct 8, 2020

asmeurer commented Oct 8, 2020

leofang commented Sep 8, 2021

asmeurer commented Sep 8, 2021

asmeurer commented Nov 29, 2022

Store function metadata in a machine readable format #49

Store function metadata in a machine readable format #49

Comments

asmeurer commented Oct 7, 2020

saulshanabrook commented Oct 8, 2020

asmeurer commented Oct 8, 2020

asmeurer commented Oct 8, 2020

leofang commented Sep 8, 2021

asmeurer commented Sep 8, 2021

asmeurer commented Nov 29, 2022