-
-
Notifications
You must be signed in to change notification settings - Fork 309
Define Validation spec vocabularies #697
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
From a practical standpoint, what I do to be honest is that formats that have an existing library supporting them get supported, and ones that don't... don't :/ until someone writes one. E.g., IIRC there isn't a decent RFC 3339 parser for Which isn't to say something like this wouldn't be useful -- probably it would just from categorizing purposes. |
I don't think splitting up the format vocabulary does anything besides making things more complicated. The reality is as @Julian noted that implementations will look for existing implementations and use them if available. And even then there will be deviations from spec, especially with the more complicated formats, not to mention those implementations that just wing it by using regular expressions. I think the current wording is fine. The reality is that probably not a single implementation supports all formats, but in my opinion that's acceptable. |
@johandorland to clarify, by "current wording" you mean the wording that is in the spec about optional implementation already? And that would go with my "minimal arrangement" proposal of four vocabularies corresponding to the four sections (currently sections 6-9 on master)? |
Also paging @jgonzalezdr @philsturgeon @dlax @awwright @Relequestual and basically everyone 😁 |
@Julian do you think that we should add wording for |
I do like the idea of, Implementations can implement most of these small groups and then optionally a few extra formats, and that's just a case for them putting a thing on their README. Over time users will PR support for more formats into their thing, and all is well. |
Vocabularies were created to define sets of keywords, but you're now proposing that a vocabulary be used to define additional values for an existing keyword. (Sorry, I'm just trying to think it through from the point of view of my implementation.) (Thinking out loud here. Maybe it'll help someone else...) I have a handler for each keyword. With vocabularies added in, it becomes a handler for each keyword/vocabulary combination. I would have to register multiple (Okay, done musing.) So it looks like you're not defining new values for an existing keyword; you're redefining the keyword altogether. And when multiple vocabularies that define a single keyword are used in a meta-schema, it merely expands the acceptable set of values for that keyword. Then, the value determines which vocabulary keyword definition to use. Finally, if two vocabularies define the same keyword and value, this results in an ambiguity and it must be considered an undefined behavior. |
@handrews Yes that is what I meant with current wording. However I wouldn't be opposed to weakening the language further. I'm still kind of new to all this RFC speak, so maybe I'm interpreting it a bit different than others. As to the "minimal arrangement" proposal that seems fine by me. When you introduce something fundamental like |
@johandorland don't forget output 😄 |
No, that's not the intention, and this is covered explicitly in section 6.5 Extending JSON Schema:
The "expanded set of acceptable values" language was added specifically to give us a way to manage custom Whether we publish one The |
@handrews I think the difference is only apparent in the implementation. I would have to have two separate keyword classes, both declaring they handle "format" but for different vocabularies. Semantically, though, yes, it's the same keyword.
I think this is the big part, and what I was after. When you have two vocabularies defining the same keyword with different values, I think the behavior has to be undefined. There may be cases where it's obvious which vocabulary applies (like one expects and array while the other expects a string), but I don't know if we can "spec" that beyond "implementations MAY use their own logic to determine which vocabulary to apply." |
Do you mean with different value types here? That's the example you give, and I agree that that behavior is undefined. I don't even want to encourage implementations to attempt to make sense of it, really. |
Regarding format validation, my opinion is to keep it simple. It's fine to separate format into its own vocabulary. I would just name it "format" (no need to qualify it as "basic" or "standard", as it will actually be identified by its URI, so there will be no confusion with any potential "extension" format vocabularies). I see no need to create separate format into different vocabularies. There are right now just a few different formats defined, and all of them are pretty common and useful for almost all JSON Schema users, so I really think that all of then should be supported. As a user, my interpretation of "should" is strictly the one defined in RFC 2119, so I would expect an implementation to support all formats, but I would also accept that an implementation did not support any of them (I would also expect that to be clearly indicated in the implementation's docs). |
That's not really how it plays out in practice. The degree of "support" provided when That said, for this draft I will go with one vocabulary for the |
OK, I think I'm just going to go with the shortest sensible names- as @jgonzalezdr noted, the fact that there is a whole URI and not just the file name means that they will not be ambiguous:
The hyper-schema vocabulary ( |
Now that the
$vocabulary
PR is nearing approval, we need to figure out how many vocabularies are present in the Validation spec, and what to call them.Minimal arrangement
I'm guessing something like:
validation
: The validation assertions in section 6basic-format
: Theformat
keyword and its standardized values in section 7embedded-content
: Thecontent*
keywords in section 8basic-metadata
: The annotations in section 9 (formerly section 10)I'm not super-attached to those names, but I do think this nicely encapsulates both the different purposes of some groups of keywords, and the optional nature of
format
andcontent*
. Optional keyword groups were always particularly confusing for users and implementors.The reason for
basic-format
is that it's extensible so I assume other people will probably have vocabularies usingformat
in the name. But I could be convinced to go withstandard-format
,format
, or the plural of any of those (standard-formats
, etc.).Multiple format vocabularies?
While
format
currently says that if you implement the keyword at all, you SHOULD implement all of the formats, this has become increasingly burdensome as the set of standard formats has expanded.We could break them up and provide vocabularies that each only declare the semantics for a subset of the standard formats. An example division could be:
date-time-formats
internet-formats
(hostnames, IP addresses, email, URIs, IRIs)json-pointer-formats
(JSON Pointer and Relative JSON Pointer)regex-format
The
internet-formats
could be split up more, although I do wonder at what point it's more trouble than its worth.@Julian @johandorland @gregsdennis as implementors what would you find useful here?
The text was updated successfully, but these errors were encountered: