Skip to content

Commit e3a87d4

Browse files
authored
Merge pull request #1553 from json-schema-org/gregsdennis/validated-format
make format validate by default
2 parents 96680e3 + 8e33d21 commit e3a87d4

File tree

2 files changed

+126
-68
lines changed

2 files changed

+126
-68
lines changed

adr/2024-11-2-assertion-format.md

+78
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,78 @@
1+
# [short title of solved problem and solution]
2+
3+
* Status: proposed
4+
<!-- will update below to only those who participated in the vote -->
5+
* Deciders: @gregsdennis @jdesrosiers @julian @jviotti @mwadams @karenetheridge @relequestual
6+
* Date: 2024-11-02
7+
* Technical Story: https://github.com/json-schema-org/json-schema-spec/issues/1520
8+
* Voting issue: https://github.com/json-schema-org/TSC/issues/19
9+
For - @gregsdennis @jdesrosiers @jviotti @mwadams @karenetheridge
10+
Neutral - @relequestual
11+
Against - @julian
12+
13+
## Context and Problem Statement
14+
15+
There's a long and sticky history around format.
16+
17+
1. Going back all the way to Draft 01, format has never required validation.
18+
2. Whether to support format validation has always been the decision of the implementation.
19+
3. The extent to which formats are validated has also been the decision of the implementation.
20+
21+
The result of all of this is that implementation support for validation has been spotty at best. Despite the JSON Schema specs referencing very concretely defined formats (by referencing other specs), implementations that do support validation don't all support each format equally. This has been the primary driving force behind keeping format as an opt-in validation.
22+
23+
With 2019-09, we decided that it was time to give the option of format validation to the schema author. They could enable validation by using a meta-schema which listed the Format Vocabulary with a true value, which meant, "format validation is required to process this schema."
24+
25+
In 2020-12, we further refined this by offering two separate vocabularies, one that treats the keyword as an annotation and one that treats it as an assertion. The argument was that the behavior of a keyword shouldn't change based on whether the vocabulary was required or not.
26+
27+
However, the fact remains that our users consistently report (via questions in Slack, GitHub, and StackOverflow) that they expect format to validate. (The most recent case I can think of was only last week, in .Net's effort to build a short-term solution for schema generation from types.)
28+
29+
Due to this consistency in user expectations, we have decided to:
30+
31+
1. make format an assertion keyword, and
32+
2. strictly enforce it by moving the appropriate tests into the required section of the Test Suite and building them more completely.
33+
34+
## Decision Drivers
35+
36+
* User expectation
37+
* Current behavior
38+
* Historical context
39+
* Disparity of current implementation support vs the proposed requirements
40+
41+
## Considered Options
42+
43+
### `format` remains an annotation keyword by default
44+
45+
This is the current state. The primary benefit is that we don't need to make a breaking change.
46+
47+
The primary downside is that the current system of (1) configuring the tool or (2) incluing the `format-assertion` vocab[^1] is confusing for many and doesn't align with user expectations.
48+
49+
[^1] The `format-assertion` vocabulary will no longer be an option since we have demoted vocabularies to a proposal for the stable release. This leaves tool configuration as the only option to enable `format` validation.
50+
51+
### `format` becomes an assertion keyword by default
52+
53+
We change the spec to require `format` validation. Furthermore:
54+
55+
* Implementations SHOULD support `format` with the defined values
56+
* Implementations MAY support others, but only by explicit config
57+
* Implementations MUST refuse to process a schema that contains an unsupported format
58+
59+
## Decision Outcome
60+
61+
The TSC has decided via vote (see voting issue above) that we should change `format` to act as an assertion by default, in line with option (2).
62+
63+
### Positive Consequences <!-- optional -->
64+
65+
* Aligns with user expectations.
66+
* Users are still able to have purely annotative behavior through use of something like `x-format`.
67+
* Increased consistency for `format` validation across implementations.
68+
69+
### Negative Consequences <!-- optional -->
70+
71+
* This is a breaking change, which means that we will likely have to re-educate the users who correctly treat it as an annotation.
72+
* Older schemas which do not specify a version (`$schema`) may change their validation outcome.
73+
* The burden on implementations will be greater since format validation was previously optional.
74+
75+
## Links <!-- optional -->
76+
77+
* [Link type] [Link to ADR] <!-- example: Refined by [ADR-0005](0005-example.md) -->
78+
*<!-- numbers of links can vary -->

specs/jsonschema-validation.md

+48-68
Original file line numberDiff line numberDiff line change
@@ -293,74 +293,59 @@ Structural validation alone may be insufficient to allow an application to
293293
correctly utilize certain values. The `format` annotation keyword is defined to
294294
allow schema authors to convey semantic information for a fixed subset of values
295295
which are accurately described by authoritative resources, be they RFCs or other
296-
external specifications.
296+
external specifications. Format values defined externally to this document
297+
SHOULD also be based on such authoritative resources in order to foster
298+
interoperability.
297299

298-
The value of this keyword is called a format attribute. It MUST be a string. A
299-
format attribute can generally only validate a given set of instance types. If
300-
the type of the instance to validate is not in this set, validation for this
301-
format attribute and instance SHOULD succeed. All format attributes defined in
302-
this section apply to strings, but a format attribute can be specified to apply
303-
to any instance types defined in the data model defined in the [core JSON
304-
Schema.](#json-schema)[^1]
300+
The value of this keyword MUST be a string. While this keyword can validate any
301+
type, each distinct value will generally only validate a given set of instance
302+
types. If the type of the instance to validate is not in this set, validation
303+
for this keyword SHOULD succeed. All format values defined in this section apply
304+
to strings, but a format value can be specified to apply to any instance types
305+
defined in the data model defined in the [core JSON Schema](#json-schema) specification[^1].
305306

306307
[^1]: Note that the `type` keyword in this specification defines an "integer"
307308
type which is not part of the data model. Therefore a format attribute can be
308309
limited to numbers, but not specifically to integers. However, a numeric format
309-
can be used alongside the `type` keyword with a value of "integer", or could be
310-
explicitly defined to always pass if the number is not an integer, which
310+
can be used alongside the `type` keyword with a value of "integer", or it could
311+
be explicitly defined to always pass if the number is not an integer, which
311312
produces essentially the same behavior as only applying to integers.
312313

313-
Implementing support for `format` as an annotation is REQUIRED (if the
314-
implementation supports annotation collection).
315-
316-
Implementing support for `format` as an assertion is OPTIONAL. Implementations
317-
which choose to support assertion behavior:
318-
319-
- MUST still collect the keyword's value as an annotation (if the implementation
320-
supports annotation collection),
321-
- MUST provide a configuration option to enable assertion behavior, defaulting
322-
to annotation-only behavior
323-
- SHOULD provide an implementation-specific best effort validation for each
324-
format attribute defined below;[^3]
325-
- MAY choose to implement validation of any or all format attributes as a no-op
326-
by always producing a validation result of true;[^4]
327-
- SHOULD use a common parsing library for each format, or a well-known regular
328-
expression;
314+
Implementations SHOULD provide assertion behavior for the format values defined
315+
by this document[^2] and MUST refuse to process any schema which contains a
316+
format value it doesn't support.
317+
318+
[^2]: Assertion behavior is called out very explicitly because it is a departure
319+
from previous iterations of this specification. Previously, `format` was an
320+
annotation-only keyword by default and implementations that supported assertion
321+
were required to offer some configuration that allowed users to explicitly
322+
enable assertion. Assertion is now a requirement in order to meet user
323+
expectations. See [json-schema-org/json-schema-spec #1520](https://github.com/json-schema-org/json-schema-spec/issues/1520) for more.
324+
325+
In addition to the assertion behavior, this keyword also produces its value as
326+
an annotation.
327+
328+
Implementations:
329+
330+
- SHOULD provide validation for each format attribute defined in this
331+
document;
332+
- MAY support format values not defined in this document, but such support MUST
333+
be configurable and disabled by default;
334+
- SHOULD use a common parsing library or a well-known regular expression for
335+
each format;
329336
- SHOULD clearly document how and to what degree each format attribute is
330337
validated.
331338

332-
[^3]: The expectation is that for simple formats such as date-time, syntactic
333-
validation will be thorough. For a complex format such as email addresses, which
334-
are the amalgamation of various standards and numerous adjustments over time,
335-
with obscure and/or obsolete rules that may or may not be restricted by other
336-
applications making use of the value, a minimal validation is sufficient. For
337-
example, an instance string that does not contain an "@" is clearly not a valid
338-
email address, and an "email" or "hostname" containing characters outside of
339-
7-bit ASCII is likewise clearly invalid.
340-
341-
[^4]: This matches the current reality of implementations, which provide widely
342-
varying levels of validation, including no validation at all, for some or all
343-
format attributes. It is also designed to encourage relying only on the
344-
annotation behavior and performing semantic validation in the application, which
345-
is the recommended best practice.
346-
347-
The requirement for minimal validation of format attributes is
348-
intentionally vague and permissive, due to the complexity involved in many of
349-
the attributes. Note in particular that the requirement is limited to syntactic
350-
checking; it is not to be expected that an implementation would send an email,
351-
attempt to connect to a URL, or otherwise check the existence of an entity
352-
identified by a format instance.
353-
354-
#### Custom format attributes
355-
356-
Implementations MAY support custom format attributes. Save for agreement between
357-
parties, schema authors SHALL NOT expect a peer implementation to support such
358-
custom format attributes.
339+
The requirement for validation of format values in general is limited to
340+
syntactic checking; implementations SHOULD NOT attempt to send an email, connect
341+
to a URL, or otherwise check the existence of an entity identified by a format
342+
instance.
359343

360-
An implementation MUST NOT fail to collect unknown formats as annotations.
344+
#### Custom format values
361345

362-
When configured for assertion behavior for `format`, implementations MUST fail
363-
upon encountering unknown formats.
346+
Implementations MAY support custom format values. Save for agreement between
347+
parties, schema authors SHALL NOT expect a peer implementation to support such
348+
custom format values.
364349

365350
### Defined Formats
366351

@@ -372,22 +357,17 @@ Date and time format names are derived from [RFC 3339, section 5.6](#rfc3339).
372357
The duration format is from the ISO 8601 ABNF as given in Appendix A of RFC
373358
3339.
374359

375-
Implementations supporting formats SHOULD implement support for the following
376-
attributes:
377-
378-
- *date-time:* A string instance is valid against this attribute if it is a
360+
- *date-time*: A string instance is valid against this attribute if it is a
379361
valid representation according to the "date-time" ABNF rule (referenced above)
380-
- *date:* A string instance is valid against this attribute if it is a valid
362+
- *date*: A string instance is valid against this attribute if it is a valid
381363
representation according to the "full-date" ABNF rule (referenced above)
382-
- *time:* A string instance is valid against this attribute if it is a valid
364+
- *time*: A string instance is valid against this attribute if it is a valid
383365
representation according to the "full-time" ABNF rule (referenced above)
384-
- *duration:* A string instance is valid against this attribute if it is a valid
366+
- *duration*: A string instance is valid against this attribute if it is a valid
385367
representation according to the "duration" ABNF rule (referenced above)
386368

387369
Implementations MAY support additional attributes using the other format names
388-
defined anywhere in that RFC. If "full-date" or "full-time" are implemented, the
389-
corresponding short form ("date" or "time" respectively) MUST be implemented,
390-
and MUST behave identically. Implementations SHOULD NOT define extension
370+
defined anywhere in that RFC. Implementations SHOULD NOT define extension
391371
attributes with any name matching an RFC 3339 format unless it validates
392372
according to the rules of that format.[^5]
393373

@@ -401,7 +381,7 @@ likely either be promoted to fully specified attributes or dropped.
401381

402382
These attributes apply to string instances.
403383

404-
A string instance is valid against these attributes if it is a valid Internet
384+
A string instance is valid against these format values if it is a valid Internet
405385
email address as follows:
406386

407387
- *email:* As defined by the "Mailbox" ABNF rule in [RFC 5321, section
@@ -489,7 +469,7 @@ A regular expression, which SHOULD be valid according to the
489469
[ECMA-262](#ecma262) regular expression dialect.
490470

491471
Implementations that validate formats MUST accept at least the subset of
492-
ECMA-262 defined in {{regexinterop}}), and SHOULD accept all valid ECMA-262
472+
ECMA-262 defined in {{regexinterop}}, and SHOULD accept all valid ECMA-262
493473
expressions.
494474

495475
## Keywords for the Contents of String-Encoded Data {#content}

0 commit comments

Comments
 (0)