Skip to content

Commit 4f9e8be

Browse files
handrewsRelequestual
authored andcommitted
Clarify various things about canonical URIs
Fixes issue json-schema-org#937, clarifying a number of other things along the way. While it touches a fair number of lines, I'm fairly sure that it doesn't anything about conformance. After spending more time reading various writings on the concept of the "canonical" URI for a resource, and reviewing our language, I came to the following conclusions: * canonical URIs only make sense at the whole-resource scope * A URI with a fragment is neither canonical nor non-canonical * It makes more sense to talk about fragments w.r.t. canonical URIs * Our language was sufficiently confusing that going this way seems fine. As part of this, I fixed an outright incorrect statement that identifier keywords set canonical URIs. Since there is only one canonical URI and a single schema object could contain three ($id, $anchor, $dynamicAnchor) or more identifier keywords, this statement is clearly a bug. These keywords assign URIs, but only $id assigns a canonical one. I revamped a lot of wording in descriptions and examples to hopefully be more precise. I separated the discussion of the empty fragment in $id from the main paragraph of its functionality, and clarified that this is talking about a media-type-specific semantic equivalence, and is not asserting that RFC 3986 normalization applies to fragments (this has been a point of confusion).
1 parent db65da8 commit 4f9e8be

File tree

1 file changed

+79
-65
lines changed

1 file changed

+79
-65
lines changed

jsonschema-core.xml

+79-65
Original file line numberDiff line numberDiff line change
@@ -315,8 +315,8 @@
315315
of five categories:
316316
<list style="hanging">
317317
<t hangText="identifiers:">
318-
control schema identification through setting the schema's
319-
canonical URI and/or changing how the base URI is determined
318+
control schema identification through setting a URI
319+
for the schema and/or changing how the base URI is determined
320320
</t>
321321
<t hangText="assertions:">
322322
produce a boolean result when applied to an instance
@@ -426,7 +426,9 @@
426426
<t>
427427
A JSON Schema resource is a schema which is
428428
<xref target="RFC6596">canonically</xref> identified by an
429-
<xref target="RFC3986">absolute URI</xref>.
429+
<xref target="RFC3986">absolute URI</xref>. Schema resources MAY
430+
also be identified by URIs including fragments. Any such URIs
431+
are considered to be non-canonical.
430432
</t>
431433
<t>
432434
The root schema is the schema that comprises the entire JSON document
@@ -730,9 +732,9 @@
730732
be able to support those keywords or vocabularies that contain them.
731733
</t>
732734
</section>
733-
<section title="Identifiers" anchor="identifiers">
735+
<section title="Identifiers">
734736
<t>
735-
Identifiers set the canonical URI of a schema, or affect how such URIs are
737+
Identifiers define URIs for a schema, or affect how such URIs are
736738
resolved in <xref target="references">references</xref>, or both.
737739
The Core vocabulary defined in this document defines several
738740
identifying keywords, most notably "$id".
@@ -1340,26 +1342,31 @@
13401342
<t>
13411343
If present, the value for this keyword MUST be a string, and MUST represent a
13421344
valid <xref target="RFC3986">URI-reference</xref>. This URI-reference
1343-
SHOULD be normalized, and MUST resolve to an
1344-
<xref target="RFC3986">absolute-URI</xref> (without a fragment). Therefore,
1345-
"$id" MUST NOT contain a non-empty fragment, and SHOULD NOT contain an
1346-
empty fragment.
1345+
SHOULD be normalized, and MUST be semantically equivalent to an
1346+
<xref target="RFC3986">absolute-URI</xref> (without a fragment).
13471347
</t>
13481348
<t>
1349-
Since an empty fragment in the context of the application/schema+json media
1350-
type refers to the same resource as the base URI without a fragment,
1351-
an implementation MAY normalize a URI ending with an empty fragment by removing
1352-
the fragment. However, schema authors SHOULD NOT rely on this behavior
1353-
across implementations.
1349+
The application/schema+json media type defines that an absolute-URI
1350+
identifying a resource and the same URI with an empty fragment
1351+
appended (which identifies the resource's root schema object) are
1352+
semantically equivalent. Since this semantic equivalence is not part
1353+
of the <xref target="RFC3986">RFC 3986 normalization process</xref>,
1354+
implementors and schema authors cannot rely on generic URI libraries
1355+
understanding the equivalence.
1356+
</t>
1357+
<t>
1358+
Therefore, "$id" MUST NOT contain a non-empty fragment, and SHOULD NOT
1359+
contain an empty fragment. The absolute-URI form MUST be considered
1360+
the canonical URI, regardless of the presence or absence of an empty fragment.
13541361
<cref>
1355-
This is primarily allowed because older meta-schemas have an empty
1356-
fragment in their $id (or previously, id). A future draft may outright
1357-
forbid even empty fragments in "$id".
1362+
An empty fragment is currently allowed because older meta-schemas have
1363+
an empty fragment in their $id (or previously, id).
1364+
A future draft may outright forbid even empty fragments in "$id".
13581365
</cref>
13591366
</t>
13601367
<t>
1361-
This URI also serves as the base URI for relative URI-references in keywords
1362-
within the schema resource, in accordance with
1368+
The absolute-URI also serves as the base URI for relative URI-references
1369+
in keywords within the schema resource, in accordance with
13631370
<xref target="RFC3986">RFC 3986 section 5.1.1</xref> regarding base URIs
13641371
embedded in content.
13651372
</t>
@@ -1623,7 +1630,7 @@
16231630
media type.
16241631
</t>
16251632
<t>
1626-
Unless the "$id" keyword described in the next section is present in the
1633+
Unless the "$id" keyword described in an earlier section is present in the
16271634
root schema, this base URI SHOULD be considered the canonical URI of the
16281635
schema document's root schema resource.
16291636
</t>
@@ -1750,7 +1757,7 @@
17501757
Since JSON Pointer URI fragments are constructed based on the structure
17511758
of the schema document, an embedded schema resource and its subschemas
17521759
can be identified by JSON Pointer fragments relative to either its own
1753-
canonical URI, or relative to the containing resource's URI.
1760+
canonical URI, or relative to a containing resource's URI.
17541761
</t>
17551762
<t>
17561763
Conceptually, a set of linked schema resources should behave
@@ -1782,13 +1789,18 @@
17821789
}
17831790
]]>
17841791
</artwork>
1785-
<postamble>
1786-
The URI "https://example.com/foo#/items/additionalProperties"
1787-
points to the schema of the "additionalProperties" keyword in
1788-
the embedded resource. The canonical URI of that schema, however,
1789-
is "https://example.com/bar#/additionalProperties".
1790-
</postamble>
17911792
</figure>
1793+
<t>
1794+
The URI "https://example.com/foo#/items" points to the "items" schema,
1795+
which is an embedded resource. The canonical URI of that schema
1796+
resource, however, is "https://example.com/bar".
1797+
</t>
1798+
<t>
1799+
For the "additionalProperties" schema within that embedded resource,
1800+
the URI "https://example.com/foo#/items/additionalProperties" points
1801+
to the correct object, but that object's URI relative to its resource's
1802+
canonical URI is "https://example.com/bar#/additionalProperties".
1803+
</t>
17921804
<figure>
17931805
<preamble>
17941806
Now consider the following two schema resources linked by reference
@@ -1810,29 +1822,31 @@
18101822
]]>
18111823
</artwork>
18121824
<postamble>
1813-
Here we see that the canonical URI for that "additionalProperties"
1814-
subschema is still valid, while the non-canonical URI with the fragment
1815-
beginning with "#/items/$ref" now resolves to nothing.
1825+
Here we see that the URI for the "additionalProperties" schema object
1826+
that is relative to its resource's canonical URI is still valid,
1827+
while the URI relative to the "items" schema object's URI no longer
1828+
resolves to anything.
18161829
</postamble>
18171830
</figure>
18181831
<t>
18191832
Note also that "https://example.com/foo#/items" is valid in both
18201833
arrangements, but resolves to a different value. This URI ends up
1821-
functioning similarly to a retrieval URI for a resource. While valid,
1822-
examining the resolved value and either using the "$id" (if the value
1823-
is a subschema), or resolving the reference and using the "$id" of the
1824-
reference target, is preferable.
1834+
functioning similarly to a retrieval URI for a resource. While this URI
1835+
is valid, it is more robust to use the "$id" of the embedded or referenced
1836+
resource unless it is specifically desired to identify the object containing
1837+
the "$ref" in the second (non-embedded) arrangement.
18251838
</t>
18261839
<t>
1827-
An implementation MAY choose not to support addressing schema resources
1828-
(and their subschemas) by non-canonical URIs.
1829-
As such, it is RECOMMENDED that schema authors only use canonical URIs,
1830-
as using non-canonical URIs may reduce schema interoperability.
1840+
An implementation MAY choose not to support addressing schema resource
1841+
contents by URIs using a base other than the resource's canonical URI,
1842+
plus a JSON Pointer fragment relative to that base. Therefore, schema
1843+
authors SHOULD NOT rely on such URIs, as using them may reduce interoperability.
18311844
<cref>
18321845
This is to avoid requiring implementations to keep track of a whole
18331846
stack of possible base URIs and JSON Pointer fragments for each,
18341847
given that all but one will be fragile if the schema resources
1835-
are reorganized. Some have argued that this is easy so there is
1848+
are reorganized. Some
1849+
have argued that this is easy so there is
18361850
no point in forbidding it, while others have argued that it complicates
18371851
schema identification and should be forbidden. Feedback on this
18381852
topic is encouraged.
@@ -1844,9 +1858,9 @@
18441858
</cref>
18451859
</t>
18461860
<t>
1847-
Further examples of such non-canonical URIs, as well as the appropriate
1848-
canonical URIs to use instead, are provided in appendix
1849-
<xref target="idExamples" format="counter"></xref>.
1861+
Further examples of such non-canonical URI construction, as well as
1862+
the appropriate canonical URI-based fragments to use instead,
1863+
are provided in appendix <xref target="idExamples" format="counter"></xref>.
18501864
</t>
18511865
</section>
18521866
</section>
@@ -2709,8 +2723,8 @@
27092723
<section title="Keyword Absolute Location">
27102724
<t>
27112725
The absolute, dereferenced location of the validating keyword. The value MUST
2712-
be expressed as a full URI using the canonical URI of the relevant
2713-
schema object, and it MUST NOT include by-reference applicators
2726+
be expressed as a full URI using the canonical URI of the relevant schema resource
2727+
with a JSON Pointer fragment, and it MUST NOT include by-reference applicators
27142728
such as "$ref" or "$dynamicRef" as non-terminal path components.
27152729
It MAY end in such keywords if the error or annotation is for that
27162730
keyword, such as an unresolvable reference.
@@ -3319,76 +3333,76 @@ https://example.com/schemas/common#/$defs/count/minimum
33193333
<list style="hanging">
33203334
<t hangText="# (document root)">
33213335
<list style="hanging">
3322-
<t hangText="canonical absolute-URI (and also base URI)">
3336+
<t hangText="canonical (and base) URI">
33233337
https://example.com/root.json
33243338
</t>
3325-
<t hangText="canonical URI with pointer fragment">
3339+
<t hangText="canonical resource URI plus pointer fragment">
33263340
https://example.com/root.json#
33273341
</t>
33283342
</list>
33293343
</t>
33303344
<t hangText="#/$defs/A">
33313345
<list>
33323346
<t hangText="base URI">https://example.com/root.json</t>
3333-
<t hangText="canonical URI with plain fragment">
3347+
<t hangText="canonical resource URI plus plain fragment">
33343348
https://example.com/root.json#foo
33353349
</t>
3336-
<t hangText="canonical URI with pointer fragment">
3350+
<t hangText="canonical resource URI plus pointer fragment">
33373351
https://example.com/root.json#/$defs/A
33383352
</t>
33393353
</list>
33403354
</t>
33413355
<t hangText="#/$defs/B">
33423356
<list style="hanging">
3343-
<t hangText="base URI">https://example.com/other.json</t>
3344-
<t hangText="canonical URI with pointer fragment">
3357+
<t hangText="canonical (and base) URI">https://example.com/other.json</t>
3358+
<t hangText="canonical resource URI plus pointer fragment">
33453359
https://example.com/other.json#
33463360
</t>
3347-
<t hangText="non-canonical URI with fragment relative to root.json">
3361+
<t hangText="base URI of enclosing (root.json) resource plus fragment">
33483362
https://example.com/root.json#/$defs/B
33493363
</t>
33503364
</list>
33513365
</t>
33523366
<t hangText="#/$defs/B/$defs/X">
33533367
<list style="hanging">
33543368
<t hangText="base URI">https://example.com/other.json</t>
3355-
<t hangText="canonical URI with plain fragment">
3369+
<t hangText="canonical resource URI plus plain fragment">
33563370
https://example.com/other.json#bar
33573371
</t>
3358-
<t hangText="canonical URI with pointer fragment">
3372+
<t hangText="canonical resource URI plus pointer fragment">
33593373
https://example.com/other.json#/$defs/X
33603374
</t>
3361-
<t hangText="non-canonical URI with fragment relative to root.json">
3375+
<t hangText="base URI of enclosing (root.json) resource plus fragment">
33623376
https://example.com/root.json#/$defs/B/$defs/X
33633377
</t>
33643378
</list>
33653379
</t>
33663380
<t hangText="#/$defs/B/$defs/Y">
33673381
<list style="hanging">
3368-
<t hangText="base URI">https://example.com/t/inner.json</t>
3369-
<t hangText="canonical URI with plain fragment">
3382+
<t hangText="canonical (and base) URI">https://example.com/t/inner.json</t>
3383+
<t hangText="canonical URI plus plain fragment">
33703384
https://example.com/t/inner.json#bar
33713385
</t>
3372-
<t hangText="canonical URI with pointer fragment">
3386+
<t hangText="canonical URI plus pointer fragment">
33733387
https://example.com/t/inner.json#
33743388
</t>
3375-
<t hangText="non-canonical URI with fragment relative to other.json">
3389+
<t hangText="base URI of enclosing (other.json) resource plus fragment">
33763390
https://example.com/other.json#/$defs/Y
33773391
</t>
3378-
<t hangText="non-canonical URI with fragment relative to root.json">
3392+
<t hangText="base URI of enclosing (root.json) resource plus fragment">
33793393
https://example.com/root.json#/$defs/B/$defs/Y
33803394
</t>
33813395
</list>
33823396
</t>
33833397
<t hangText="#/$defs/C">
33843398
<list style="hanging">
3385-
<t hangText="base URI">
3399+
<t hangText="canonical (and base) URI">
33863400
urn:uuid:ee564b8a-7a87-4125-8c96-e9f123d6766f
33873401
</t>
3388-
<t hangText="canonical URI with pointer fragment">
3402+
<t hangText="canonical URI plus pointer fragment">
33893403
urn:uuid:ee564b8a-7a87-4125-8c96-e9f123d6766f#
33903404
</t>
3391-
<t hangText="non-canonical URI with fragment relative to root.json">
3405+
<t hangText="base URI of enclosing (root.json) resource plus fragment">
33923406
https://example.com/root.json#/$defs/C
33933407
</t>
33943408
</list>
@@ -3432,16 +3446,16 @@ https://example.com/schemas/common#/$defs/count/minimum
34323446
<t>
34333447
This transformation can be safely and reversibly done as long as
34343448
all static references (e.g. "$ref") use URI-references that resolve
3435-
to canonical URIs, and all schema resources have an absolute-URI
3436-
as the "$id" in their root schema.
3449+
to URIs using the canonical resource URI as the base, and all schema
3450+
resources have an absolute-URI as the "$id" in their root schema.
34373451
</t>
34383452
<t>
34393453
With these conditions met, each external resource can be copied
34403454
under "$defs", without breaking any references among the resources'
34413455
schema objects, and without changing any aspect of validation or
34423456
annotation results. The names of the schemas under "$defs" do
34433457
not affect behavior, assuming they are each unique, as they
3444-
do not appear in canonical URIs for the embedded resources.
3458+
do not appear in the canonical URIs for the embedded resources.
34453459
</t>
34463460
</section>
34473461
<section title="Reference removal is not always safe">

0 commit comments

Comments
 (0)