Skip to content

Meta-schemas for vocabularies seem to have incomplete lists of used vocabularies #964

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
epoberezkin opened this issue Aug 7, 2020 · 15 comments
Assignees
Labels

Comments

@epoberezkin
Copy link
Member

epoberezkin commented Aug 7, 2020

Maybe it was reported already - sorry if I didn't find it.

For example, https://json-schema.org/draft/2019-09/meta/core uses applicator and other vocabularies but includes only core.

https://json-schema.org/draft/2019-09/meta/applicator uses core and others, but includes only applicator.

etc.

Am I missing something?

@ghost ghost added the triage label Aug 7, 2020
@Relequestual
Copy link
Member

Relequestual commented Aug 8, 2020

Hey @epoberezkin, yes you're missing something here =]

The "$vocabulary" keyword is used in meta-schemas to identify the
vocabularies available for use in schemas described by that meta-
schema.

https://tools.ietf.org/html/draft-handrews-json-schema-02#section-8.1.2

$vocabulary does not define vocabularies using in the same schema as itself. The dialect (collection of vocabularies) is identified by $schema. The associated meta-schema includes $vocabulary to identify the vocabularies used and if they are required or not.

$vocabulary isn't a keyword for general schema use, but only for use in meta-schemas.

If a JSON Schema wants to use a different dialect or create a new dialect, you can do this by first creating the meta-schema which identifies the vocabularies used, and then use the $id of said new meta-schema in the schema $schema which wants to use it.

Shout if you need any further clarification.

I find Appendix D very helpful in understanding how this fits together.
I DO need to re-work some wording in the spec now we have a term for a collection of vocabularies ("feature set" for the keywords, "dialect" for the whole thing).

@Relequestual
Copy link
Member

To clarify, the core meta-schema you include, uses the full JSON Schema dialect, but is itself JUST the meta-schema for the core vocabulary.
All meta-schemas must use the core vocabulary (It defines $id, $schema, $vocabulary).
Look at the general purpose meta-schema: http://json-schema.org/draft/2019-09/schema
Notice it doesn't include what's already defined in vocabularies it defines as using.

@handrews
Copy link
Contributor

handrews commented Aug 8, 2020

The intuitive way to think about this is that schema keywords describe the instance.

The only exceptions are $schema (which links the meta-schema, and therefore describes the resource that contains it) and $id, $anchor, and $dynamicAnchor which create identifiers for the resource that contains them.

Everything else, including $vocabulary, describes the instance to which the schema resource is applied. Since $vocabulary only has meaning when the instance is a schema, it is only useful in meta-schemas. You can have it in a schema applied to a non-schema instance, it just doesn't do anything in that case.

@epoberezkin
Copy link
Member Author

$vocabulary does not define vocabularies using in the same schema as itself. The dialect (collection of vocabularies) is identified by $schema. The associated meta-schema includes $vocabulary to identify the vocabularies used and if they are required or not.

That makes sense, that is what I initially thought based on that sentence you quoted, but there is another place in the spec that made me think that the schema can override vocabularies defined in its meta-schema:

If "$vocabulary" is absent, an implementation MAY determine behavior based on the meta-schema if it is recognized from the URI value of the referring schema's "$schema" keyword.

This implies that the $vocabulary can be present in the schema to define the vocabularies it is using. Or does it refer to the meta-schema of the meta-schema?

Further, if this is not correct, then I do not understand why would core, applicator etc. metaschemas have $vocabulary at all? Firstly, they are not expected to be used as meta-schemas on their own - maybe core can be used on its own, but others require core. Secondly, if there is a scenario when they would be used as meta-schemas, they should at least include core vocabulary, and, maybe, applicator and some other vocabularies as well, as it is unlikely you can construct a schema without core and applicator...

I am still missing something here...

@handrews
Copy link
Contributor

handrews commented Aug 8, 2020

This implies that the $vocabulary can be present in the schema to define the vocabularies it is using. Or does it refer to the meta-schema of the meta-schema?

It should probably read If "$vocabulary" is absent from the meta-schema. If S is the schema, and M is the meta-schema, you start by examining S. You follow $schema to M. If M has $vocabulary then that is what determines the vocabulary of S. If M does not have $vocabulary, but the URI of S is "recognizable", then the implementation can infer the vocabulary from the recognizable URI.

Translation: If someone uses an old pre-$vocabulary meta-schema that your implementation recognizes, you can assume it still means what you thought it meant. Also, this preserves the old behavior that if an implementation recognizes a custom meta-schema URI, it can process whatever extensions it knows that meta-schema indicates. I don't know if anyone ever implemented this, but the spec allowed it.

Further, if this is not correct, then I do not understand why would core, applicator etc. metaschemas have $vocabulary at all? Firstly, they are not expected to be used as meta-schemas on their own - maybe core can be used on its own, but others require core.

I mean, you could use them that way, and it seemed better to go ahead and put the appropriate value in than to leave it out. Leaving it out when the spec makes a big deal out of $vocabulary in meta-schemas seemed like a bad idea. This way if for some reason you want to use them on their own, you can. I agree that it is unlikely, but consistency seems best here.

Secondly, if there is a scenario when they would be used as meta-schemas, they should at least include core vocabulary

I think we wrote it such that core is always assumed to be present, even if you don't list it, but you SHOULD list it? I'd have to go dig through to find that. But you have to assume core to even follow $schema and process $id and $vocabulary so de-facto core is always in use.

Secondly, if there is a scenario when they would be used as meta-schemas, they should at least include core vocabulary, and, maybe, applicator and some other vocabularies as well, as it is unlikely you can construct a schema without core and applicator..

A meta-schema should only declare the vocabularies that it describes. The validation meta-schema only describes the validation assertions, so that's the only vocabulary (other than core, which as noted is a special case) that it declares.

Is it useful on its own like that? Not very. I can't imagine using it on its own.

Is it correct on its own like that? Yes. It is correct and consistent which seemed more important than doing something inconsistent for single-vocabulary meta-schemas just because they're unlikely to be used on their own in the real world.

@Relequestual Relequestual self-assigned this Aug 10, 2020
@karenetheridge
Copy link
Member

The $vocabulary keyword seems to be being used in two different ways here, and I'm not sure if this is fully spelled out by the specification.

  • https://json-schema.org/draft/2019-09/schema has the $vocabulary keyword to specify which vocabularies are enabled when this document is used as the metaschema for a schema.
  • If individual meta/* schemas are to be used as metaschemas themselves, the use of $vocabulary is valid here to define what vocabularies are enabled when this document is to be used as the metaschema. In this use, @epoberezkin's concern is correct that each of these meta/* documents should be listing every vocabulary that they use (which is core, applicator and validator). (see footnote 1.)
  • however, the way the $vocabulary keyword seems to be being used at present in the meta/* documents is as another type of identifier: that is, they are saying "this document defines a vocabulary, and this is the identifier for that vocabulary that may be used in the $vocabulary keyword in metaschemas". That usage is not defined by the spec.

footnote 1: I don't think it's valid to use an individual meta/* document as a metaschema on its own though -- it doesn't spell out the full set of vocabularies for a schema to be useful. e.g. a schema with just applicator keywords can't do much (via properties, prefixItems) except check that certain positions exist in the instance data. So this usecase can't be the intention of the $vocabulary usage here.

So, the issue I'm having is: when parsing a schema (either as a regular schema for the purposes of evaluating instance data, or as a metaschema for determining which vocabularies it supports), what do we do when encountering a $vocabulary keyword? It's clear what the intent is when at the top level metaschema document itself (https://json-schema.org/draft/2019-09/schema) - we are enabling or disabling particular vocabularies for all schemas that use this document as the metaschema. But what do we do when encountering the $vocabulary keyword in any $referenced document, such as meta/applicator? Is this only intended for human consumption and the parser implementation should ignore it? If so, wouldn't it be better to move this data into $comment so it is clear it is only intended for human consumption?

@jdesrosiers
Copy link
Member

I found the duality of the $vocabulary keyword awkward and confusing as well. I found that my implementation has no use for it if it's not a top level meta-schema, so I just ignore it in those cases and move on. I agree that it's only useful as documentation and it would be less confusing if $vocabulary wasn't co-opted for this documentation.

@handrews
Copy link
Contributor

handrews commented Sep 1, 2020

$vocabulary wasn't co-opted for anything. It does the same thing in both places, the fact that some of those places are unlikely to be used is irrelevant. It harms nothing to have them there, and it's consistent and promoting best practices for meta-schemas. There's nothing else going on here, $vocabulary means what it always means and those are the correct values for those meta-schemas.

@karenetheridge
Copy link
Member

$vocabulary means what it always means

This isn't terribly helpful, I'm afraid. I'm still not sure what to do re my question above:

when parsing a schema, what do we (the implementation) do when encountering a $vocabulary keyword? It's clear what the intent is when at the top level metaschema document itself... But what do we do when encountering the $vocabulary keyword in any $referenced document, such as meta/applicator?

@Relequestual
Copy link
Member

$vocabulary means what it always means

This isn't terribly helpful, I'm afraid. I'm still not sure what to do re my question above:

when parsing a schema, what do we (the implementation) do when encountering a $vocabulary keyword? It's clear what the intent is when at the top level metaschema document itself... But what do we do when encountering the $vocabulary keyword in any $referenced document, such as meta/applicator?

Nothing. It's only useful in the "top-level meta-schema".
It's used in a single vocabulary meta-schema to show what another meta-schema using that vocabulary should also require, AS WELL AS being logically correct (specifying what vocabularies must be understood in order to process an instance which has defined a dialect which uses such as its meta-schema).

It initially seemed like a duality to me, but in actual fact, it isn't.

If you think about it further, you can actually use this data to automagically construct yourself a new dialect based on a set of vocabulary meta-schemas, I think. But that's more an aside.


I found the duality of the $vocabulary keyword awkward and confusing as well. I found that my implementation has no use for it if it's not a top-level meta-schema, so I just ignore it in those cases and move on. I agree that it's only useful as documentation and it would be less confusing if $vocabulary wasn't co-opted for this documentation.

This is exactly correct. It's only meta-schemas which are "referenced" by use of $schema where $vocabulary should be considered. The individual vocabulary meta-schemas are referenced by applicator $ref, and so $vocabulary should not be considered, because it's no longer a meta-schema root in that context..

@handrews
Copy link
Contributor

handrews commented Nov 17, 2020

@karenetheridge that reply was responding to @jdesrosiers about $vocabulary being "co-opted for documentation."

A meta-schema should declare the vocabularies that it describes, and no more. So the applicator single-vocabulary meta-schema declares schemas that use it directly only rely on applicator (and core) semantics. Somewhere there is an issue with a use case for this- basically using only applicators and an additional annotation vocabulary for, I think, UI generation. It was a bit contrived but had some relation to an actual project, and is the reason that the applicator vocabulary is in the core spec as a separate vocabulary.

So to address

footnote 1: I don't think it's valid to use an individual meta/* document as a metaschema on its own though -- it doesn't spell out the full set of vocabularies for a schema to be useful. e.g. a schema with just applicator keywords can't do much (via properties, prefixItems) except check that certain positions exist in the instance data. So this usecase can't be the intention of the $vocabulary usage here.

You don't think so, but you don't define the entire set of use cases ;-P Others have come up with use cases where this is at least plausible. I will admit, not compelling- I'm with you there! But plausible.

More importantly, people will extend the regular schema dialect and allOf it. They may want to change which vocabularies are optional vs required, and add more vocabularies. But of course the regular meta-schema will not have those vocabularies, nor should it- that's the whole point of extensibility. That is where the use case of ignoring reference $vocabulary keywords is critically important. It would probably be possible to come up with some sort of combining rule, but that is far more complex and no one had a use case (unlike the single-schema meta-schemas declaring only their own vocabulary, which is if anything less complex because you don't have to figure out what "the full" set of vocabularies is supposed to be- at minimum, it is not more complex, it's just not very useful).

@handrews
Copy link
Contributor

handrews commented May 6, 2021

@epoberezkin was your question sufficiently answered (or have you sorted it out elsewhere and no longer need info here)?

@karenetheridge @jdesrosiers @Relequestual the discussion drifted a ways off track from just answering the original question. If there is anything else that needs further discussion, can we file that separately and close this?

@epoberezkin
Copy link
Member Author

thank you!

@karenetheridge
Copy link
Member

I'm going to reserve comment on usage of $vocabulary until I actually successfully implement it, as I'll be better able to articulate the problems that I see with it (or someone will tell me that I'm understanding it wrong, and we'll massage the spec wording to be more clear).

@handrews
Copy link
Contributor

@karenetheridge that sounds like you're not specifically worried about the question here, so it's probably best if you just file new issues if and when you find stuff with vocabularies during implementation. I'm going to close this particular issue since the original question is resolved- the current usage is well-defined.

The questions around whether the current usage is long-term desirable are better addressed in #1098, which directly addresses the fact that a lot of people are confused by the current wording.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants