-
-
Notifications
You must be signed in to change notification settings - Fork 311
State that implementations SHOULD accept a schema retrieval IRI / initial base IRI #1299
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
For example of this implemented in the wild, see the playground at https://json-everything.net/json-schema. There is an option that allows the user to specify a default base URI. |
Agreed. I have been toying with this thought. I have a "rebase as root" option which allows you to take a JSON island and rebase it as if it were an isolated document root; and I also have a "don't rebase this but treat this island as a base schema embedded in a non-schema document" (which is essentially the "OpenAPI" case). But I don't have a "use this iri as the base for a schema, instead of the (potentially local filesystem-based) iri you actually came from. |
I definitely think that we need to include language that states this is only a fallback for when |
@gregsdennis good point. Should be obvious, but I am well aware that hardly anyone actually reads RFC 3986 no matter how much I xref it 😛 |
[EDIT: The requirements I wrote here were wrong, better ones forthcoming in a new comment.] |
OK, thinking this through again and hopefully correctly this time, I think the requirements are:
Note that with this approach, embedded schema resources and schema documents work the same way in terms of retrieval IRIs. Technically, this would mean that we don't need to specify that part of the behavior as part of the The point I'm less clear on is whether the retrieval IRI needs to be associated with the schema resource root. This would allow it to be used in I would say that it is a bad practice to use such an IRI in
|
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
Nothing in this issue changes any functionality or has anything at all to do with how RFC 3986 is applied. §9.1.1. Initial Base IRI describes that an initial base IRI (an IRI against which a relative IRI-reference It does not have any normative language indicating an implementation requirement for accepting such an external initial base IRI. There is an implicit requirement in the description of how such a thing can be used. All this issue says is that we should make the existing implicit requirement explicit and normative. @jdesrosiers has informed me that he considers such a change to contribute to bloat in the spec, and that §9.1.1 repeats too much from RFC 3986, but has agreed that his concerns about bloat can be addressed separately from this issue. I happen to agree with him that §9.1.1 can be reduced substantially, but that is not the topic for this particular issue. All of our implementation requirements need clear and explicit normative language. Nothing in this proposal changes anything about how RFC 3986 is applied. Therefore, I am marking all comments that assume that this issue proposes such changes to be off-topic. I have updated the original comment so that it is more clear what is within the scope of this issue. |
Thinking further on this, can anyone come up with a reason that this needs to be a SHOULD and can't be a MUST? What are valid reasons for not being able to accept an initial base IRI (retrieval or otherwise)? An implementation that cannot accept an initial base IRI will be unable to process schemas that use non-fragment-only relative references. At least in general, and the effort involved in figuring out if you can resolve relative references without a base IRI can only really be done within a single schema document. I'm inclined to do the following:
|
Technically, the requirement to use a base URI/IRI, even if it must be provided out-of-band, is already specified in RFC 3986/3987, and is incorporated by normative reference; so I think MUST or SHOULD would technically over-constrain/redundantly constrain the specification. However, some implementation guidance would be warranted here (with lowercase "should", and also pointing out that if you don't provide this option, some schemas cannot be parsed, as you point out). |
No, RFC 3986 tells you how the base URI is to be calculated. It does not impose any requirements on how an implementation of anything does or does not have to accept a base URI. As currently written, it is entirely compliant with both RFC 3986 and JSON Schema to not accept a base URI/IRI at all. JSON Schema is the only specification that controls what JSON Schema implementations are required to offer. If we don't require this, it's not required. There is no reason to make it a lowercase "should." An uppercase SHOULD is correct as it facilitates interoperability by ensuring that changing from one implementation to another won't result in a loss of base URI-setting functionality. It's debatable as to whether it ought to be a SHOULD or a MUST, and the question there is what is the downside of making it a MUST? What are the conditions under which it is advantageous to not accept a base URI? If there are significant situations where it is advantageous, that makes this a SHOULD. If there are not, then it is better as a MUST, again to ensure that different implementations offer a consistent feature set. |
And yes, the consistency issue is real. I filed this after surveying all implementations listed on our implementations page and realizing that implementations are horribly inconsistent in this area, and often not even well-documented. It's clear that there are real, practical problems caused by the under-specification of this behavior. |
Isn't this implicit? A base URI is required to resolve a URI Reference; and this base URI might be a document URI that's only known out-of-band; therefore, if a validator doesn't accept a default base URI, some URI References will be unresolvable, or will use the application-specific base URI. If we want to avoid using the application-specific URI, then that would be a good excuse to use a SHOULD, I think. |
The point is that people have implemented the spec without the ability to set an external (e.g. retrieval URI or application-defined base URI), which leads to failure to resolve relative references, which leads to user confusion when they can't get their schemas to work. All of which is 100% compatible with RFC 3986 and JSON Schema as written. This makes the implementation less usable. The simple solution is to allow specifying such a base URI. It is objectively verifiable fact that implementers do not consistently understand this, and therefore do not offer the ability consistently. That is an interoperability problem if you are relying on this totally reasonable RFC 3986-compliant process. The simple solution to this interoperability problem is normative language. There is no normative language in either RFC 3986 or JSON Schema to guide implementations. So let's put it in there. Why is this such a problem? Please explain to me the actual, measurable downsides. |
I just filed #1322 about dropping the "initial base URI" concept in favor of standard RFC-3986 terms. I think that should be resolved before this can be decided since this is based on the "initial base URI" concept. |
@jdesrosiers for this issue the only thing that matters is whether a JSON Schema implementation can accept a base URI to use or not. I am still waiting for anyone to explain to me why it is advantageous to maintain the status quo that has resulted in implementations not accepting such a base URI, and apparently not being aware that it's a thing. Or why it would be harmful to throw in a SHOULD for this. Folks, please give me literally anything to work with here. "RFC 3986 implies this" is not enough, as that does not clarify implementation requirements for JSON Schema. RFC 3986 defines a process, not implementation requirements. |
I disagree and I explain why in #1322. It makes a difference whether we suggest implementations take an initial base URI vs a retrieval URI vs a default base URI.
I addressed this in #1299 (comment). You insisted that it was "off-topic". We are not maintaining the status quo. This issue was addressed in the UJS documentation a year ago. A year is very new on these timescales. I think we just need to give it enough time to have an effect. Most implementations were written before that was released.
Yes, RFC-3986 puts constraints on what you can implement, but doesn't tell you what you should implement. I just don't think it's necessary for the spec to prescribe or even suggest that implementations provide this feature. I agree with your assertion that the issue is largely implementers "not being aware that it's a thing". That problem is now being addressed in UJS. I think that's enough. |
Only normative requirements are testable. UJS is important as education and guidance (of which I agree we have too much in the spec), but from a spec compliance perspective, it's irrelevant. Let's assume the test suite was expanded to explicitly test the input interface of JSON Schema implementations. It doesn't matter how (I'm aware that there would be many difficulties), let's just assume it works. I want a test case that tests whether a JSON Schema implementation can accept a base IRI and use it to resolve a relative I want that test case to either be in the required set, or the "should" set, depending on whether we think the requirement is absolute or whether it can be disregarded under some circumstances. @Julian, would the mention in Understanding JSON Schema that @jdesrosiers notes be sufficient for you to agree to such a required or "should" case? Would the current language in §9.1.1 of the core spec be sufficient? My expectation would be that no, it would not. There is nothing in the spec that states such a requirement. Between RFC 3986 and JSON Schema Core, it's clear how to use a retrieval URI, default URI, etc. is intended to be used if it can be used. But there is nothing that requires an implementation to be able to use it. We can't consider something a testable requirement when it's not even a requirement at all. UJS does not create requirements, and §9.1.1 doesn't state one either. |
(Hi hi. Will review the relevant section but may take me till Monday to do so I'm flying the next few days.) |
Of course a UJS page doesn't imply any requirement on implementations. It just helps people understand the concepts so they can make better choices about what they support in their implementation. You can still have tests in the test suite for these features to help understand what implementations support, but they would be optional. Implementations that don't support those features don't need to pass those tests to be in compliance. |
The whole point of this issue is to clarify this as a SHOULD or MUST requirement. You are welcome to argue that the requirement ought to be MAY or unstated, but despite repeated requests you have not explained what problems would be caused by a SHOULD or MUST requirement. Your statement that such a test case would be optional proves my point that this needs normative language. If you think that SHOULD is too high of a requirement, please explain what problems a SHOULD requirement would create. Likewise for MUST. I've asked this several times without a response. I already understand that your opinion is that the status quo is fine, but I have given examples of why I think this is an important use case to support, and therefore worth at least a SHOULD. What problems would a SHOULD requirement cause? Who would be harmed, and how? As an example, if we were to state a MUST requirement to support What is the harm caused by a requirement that implementations SHOULD accept a base URI? |
To re-state the use case: There are many use cases for a non-RFC3986-compliant base URI. RFC 3986 tells us how to determine the base URI correctly, but sometimes what is technically incorrect is more appropriate. A set of schemas that can be hosted at different locations (because it is part of a system that is expected to run behind firewalls and therefore not have a globally accessible URL) will have relative $ids. During development and testing, it would make more sense to load them from a filesystem, which means the retrieval URIs would be file:// URIs. But instead we want https:// URIs because that's what will be used in production. So we override RFC 3986 and supply the base URI for the test environment we need. |
To tie the loop on
I'll share my opinion just on the "where does it seem to me it should go" question, not on what I understand the root proposal (and/or disagreement) to be, which, to explicitly ensure I follow, is about whether our specification should specifically recommend implementers include a place to specify a base URI to use globally when encountering a schema which doesn't indicate one and needs it. My brief understanding of @jdesrosiers' objection to this is I think that Jason you prefer we instead recommend implementations take a retrieval URI per-schema, and are less bothered by suggesting all implementations also support a global default base URI to use when seeing a schema with no retrieval URI and which uses relative references. So if that's all a correct basic understanding of the proposal (or even if it isn't), I think the ping to me was just to see if I agree about where tests would go for such a thing? In the current test suite layout where everything goes in
I thought you disagreed we should have such tests, but maybe I misunderstood your opinion before. I agree certainly with this line though, if we want implementations to support this and it isn't required, then yeah, there. If we're talking post json-schema-org/JSON-Schema-Test-Suite#590 where now we separate things by whether the spec really recommends them or says nothing (which isn't merged yet because I only had feedback from Greg who +1'ed, Henry who sounded -0ish, and Jason who was +0ish, so I'd like to see some other folks who use the suite comment first): In the current language which doesn't address this yeah I'd think this should go in If I got all that right, to me it seems there's a core disagreement about whether we want this recommendation or not (and instead want one recommending implementations take retrieval URIs, and just light guidance on whether implementations support this too). I think my understanding of Jason's objection is he just doesn't think this is necessary, and prefers implementations instead focus on APIs that ensure every schema has a retrieval URI. His implementation, if I'm not mistaken, indeed doesn't support specifying a global base URI, it requires specifying retrieval URIs alongside every schema, so this change would mean hyperjump going against the SHOULD basically. Hopefully some of that is correct / helpful? |
Thanks, @Julian , that is helpful.
This is incorrect. This issue has nothing to do with a global base URI (meaning an RFC 3986 §5.1.4 Default Base URI for JSON Schema implementations). I don't understand why this keeps becoming the topic of discussion, but perhaps it's due to the mis-use of the term "default base IRI" in §9.1.1 ¶1? I have filed draft PR #1324 to show what I consider to be the correct wording there, which is the starting point for this proposal.
§9.1 Loading a Schema, including §9.1.1
Currently, §9.1.1 duplicates a lot of RFC 3986 (which per #1322 we agree should be fixed), implying that relative references in a schema that does not have an absolute All this proposal does is strengthen the implied ability to take a base IRI of some sort to a SHOULD. It doesn't change or strengthen anything about the §5.1.4 "default base IRI" option. It does not say anything about how the ability to take a base IRI is to be accomplished (quoting from the initial comment):
According to Hyperjump JSC's documentation regarding schema identification:
Therefore, Hyperjump JSC is already in compliance with this proposal. Which is why I do not understand the vehement objections. There is zero impact on Hyperjump (or, for that matter, python-jsonschema AFAIK). This is why I keep asking who would be harmed by this SHOULD, and how? I cannot figure out the downside. As for the upside:
Right, that's what I would expect, and if we were to merge test suite PR 590 and add a test case for this, I'd expect it to go into As @jdesrosiers has made clear by stating that he does not run the optional tests, and as can be observed by looking over how a variety of implementations do and do not use the test suite, a requirement that can only be tested in When I asked about reference resolution tooling requirements in the AsyncAPI discussion (which, of course, incorporates JSON Schema referencing), the response was:
This is an interoperability concern and fits the recommendations of RFC 2119 §6 regarding the use of MUST and SHOULD. The only way to guarantee consistent behavior as requested is with a MUST or SHOULD normative requirement. If it's a SHOULD, there ought to be guidance on when it is safe to disregard. Such a requirement could then be reflected in either the required test suite or the (future) "should" suite. At least, if one ignores that this is not necessarily testable through the validation output (although some aspects are, and are already covered). |
My preference is to close this in favor of the referencing discussion. It can be re-filed if needed later. I will close this if there are no objections in the next few days. |
While we discuss initial base IRIs, at no point do we make it clear that it's advantageous to allow an application to supply a retrieval IRI or other IRI for use as a base IRI in the absence of an absolute
$id
. Implementations vary with respect to how they handle this (or don't).This strikes me as a SHOULD requirement rather than a MUST. It's advantageous to allow, but JSON Schema is usable without it, and the exact mechanism is not something we should specify. It's also conceivable that a specialized implementation would have a reason to skip this on the grounds of minimizing code or knowledge that external IRIs of this sort will not be useful in its intended execution context.
CLARIFICATION: "Initial base URI" is not the same thing as RFC 3986 §5.1.4's "Default base URI". An initial base URI can come from any of the sources described in RFC 3986 §5.1.2 - 5.1.4. This issue doesn't have anything to do with setting a broader §5.1.4 default base URI. Several comments related to that concept have been marked off-topic.
The text was updated successfully, but these errors were encountered: