-
Notifications
You must be signed in to change notification settings - Fork 398
Status
All section 5 of the draft is supported, apart from the limitations mentioned below. This includes:
- union types (in type as well as in disallow),
- full dependencies,
- "multiple extends" (ie, an array of schemas),
- tuple/non-tuple validation for arrays,
- $ref (but limited, see below) with loop detection,
- formats (but limited, see below),
- enums,
- etc etc.
All in all, quite a complete implementation.
ALL of section 6 is for now unsupported. Mainly because I don't understand much about it... I prefer to leave it untouched rather than implementing crap :p
Also, section 5 has three missing keywords:
- default,
- $schema,
- id.
The default keyword is likely never to be supported since this API is about validation only (unless, that is, I decide on doing JSON Patch along with JSON Schema). As to the two other ones, they are important schema metadata which are in the plans. Just not yet, though.
The draft says $ref can be any URI. Builtin support is limited to HTTP-only URLs (yes, this means no HTTPS) and JSON Pointer, or a combination of both. For instance:
- #/some/path (a JSON Pointer applying to the currently active schema);
- http://some.location/path/to/schema (a schema accessible on the Internet);
- http://some.location/path/to/schema#/path/into/schema (same as above, but specifying a pointer within the downloaded schema).
JSON Pointer-only references will also work correctly for downloaded schemas (the refs will then be relative to that schema, not the schema you registered initially).
See below for a more complete discussion about URIs and JSON Pointer.
UPDATE: in version 0.2, you will be able to register URI handlers for any scheme of your choice.
These format specifications aim to validate respectively a CSS 2.1 color and style. Unfortunately, the best candidate I have found for parsing CSS, jStyleParser, is not available on maven...
So, it is right now a mix of regexes and whatever information I could gather on CSS -- not much.
In a schema, these enforce resp. the minimum/maximum length of a string instance, and the minimum/maximum number of items of an array instance. The implementation won't accept any values for these which are greater than Integer.MAX_VALUE, that is... 2^31 - 1. You don't have JSON documents that big, do you?
Numeric instances (integers and numbers) are another story, see below.
(for some definition of "implicit")
If a property of an object instance being validated matches exactly a field defined in properties, then this property will be validated against the corresponding schema, so far so good.
However, nothing says that this property should match only this schema. In fact, in this case, the implementation also goes through patternProperties to see if the property happens to match a regex in there too (and see below about regexes). If and only if the property matches neither of them is additionalProperties considered (provided that it is not false, of course).
As an example, consider this schema:
{
"type": "object",
"properties": {
"p1": { "type": "string" }
},
"patternProperties": {
"p": { "minLength": 10 },
"1": { "format": "host-name" }
}
}
Now, if the instance to validate contains a property named p1:
- it will of course have to match the schema defined by the corresponding entry in properties;
- but it also matches regexes p and 1 (again, see below), so it will have in fact to match all three schemas: the one defined in properties and the two schemas in patternProperties.
Curiously, the draft doesn't say that, for instance, if exclusiveMinimum is present, then minimum MUST also be present. Neither does it say that the number in divisibleBy must not be 0. However, if you have a look at the schema , you see this:
// divisibleBy definition:
"divisibleBy" : {
"type" : "number",
"minimum" : 0,
"exclusiveMinimum" : true,
"default" : 1
},
// dependencies:
"dependencies" : {
"exclusiveMinimum" : "minimum",
"exclusiveMaximum" : "maximum"
},
Which means what it means. Those are therefore enforced at the syntax checking level.
This applies to integer and number JSON nodes, and therefore to the minimum, maximum and divisibleBy keywords. And especially to the latter.
What happens here is that the JSON spec itself doesn't specify a range limit for numeric instances, and neither does the JSON Schema draft. These three keywords therefore theoretically apply to arbitrarily large numbers and/or numbers with an arbitrarily large precision. This is why all three validators above use Java's BigDecimal for validation. (However, please note that Javascript limits itself to 64-bit IEEE 754 floating point numbers -- but JSON may have Javascript in its acronym, it doesn't mean it is limited to Javascript. Consider MongoDB, for example)
On the TODO list is a way to make the validation process faster at least for numbers falling within Java's long primitive type (this is likely to represent a good number of use cases, so it is worth it). For decimal validation however, rounding has to be taken into account... And rounding precision means rounding errors, which means inaccuracies, which means wreaking havoc to the divisibleBy check in particular. I don't like inaccuracy, so, for decimal numbers, BigDecimal it is and it will likely remain so for the foreseeable future.
Please note: version 0.1.1 is limited to long and double -- an oversight which I just discovered
The draft is quite clear that regexes should conform to ECMA 262. This rules out java.util.regex entirely (for instance, possessive quantifiers, like in a++, are legal in Java, but are not supported by ECMA 262). The only Java library (that I know of) in existence which is able to process ECMA 262 regexes is Rhino and its Javascript engine. This project uses it for that reason.
Also, even though the draft only implies it (and as the Javadoc points out in several places), please note that the definition of matching is the real one, not the "Java one": a regex can match anywhere in the input! So, remember this when writing your schemas -- if you want your regex to match the whole input, you must anchor it. This is valid for the pattern keyword, but also for keys in patternProperties. A JSON Schema implementation which doesn't act this way simply does not obey the draft!
These are two of the format specifications defined by the draft (resp. host-name and email). The fine points:
- nothing in the RFCs defining hostnames says they should be fully qualified;
- nothing in the RFCs defining mail addresses says they should have a domain component.
This implementation strictly obeys the RFCs. The draft doesn't force hostnames or emails to be fully qualified/have a domain part. Therefore, as far as all relevant RFCs are concerned, foobar is a perfectly valid hostname -- and email address.
As a workaround, you can use a combination of the appropriate format with a minimalist pattern specification. For instance:
// Host names: at least one dot in it
{
"format": "host-name",
"pattern": "\\\\."
}
// Email address: at least a @, followed by everything but a point one or more times, followed by a dot
{
"format": "email",
"pattern": "@[^.]+\\\\."
}
This is enough, since the format validators themselves will ensure that the inputs are well formed.
This format specification is said to be the number of milliseconds since epoch (that is, Jan 1st 1970 at 00:00 GMT). This is, in essence, a signed 32-bit integer times 1000. The implementation makes the choice to consider a numeric instance bound to this format specification invalid if:
- it is negative, or
- its result divided by 1000 is greater than 2^31 - 1.
This may, or may not be, a problem for you, YMMV. But if you actually plan to use such a formatted value in one of your programs, I think it is useful to enforce these.
JSON Pointer aims at describing a unique way to address specific paths within a JSON document. It has been very carefully written to use fragments in a smart way. The next JSON Schema draft will require JSON Pointer support. For instance, this URI:
http://json-schema.org/draft-03/schema#
is to be separated in two parts: the non fragment part (http://json-schema.org/draft-03/schema) and the fragment part (#). And # is the JSON Pointer pointing to the root of the document. If the URI is:
http://some.other.host/xxxx/theschema#/schema1
then it means that the JSON schema to use is the JSON document at path #/schema1 starting from the JSON document located at http://some.other.host/xxxx/theschema.
The best part: it can actually address all of a JSON document. For instance, the following property names are perfectly valid in a JSON document, and the JSON Pointer spec allows to address all of them:
- an empty property name (yes, it is valid, and it also means that # and #/ are NOT the same path);
- /, % and #;
- . and .. (this is why JSON Pointers are never relative);
- a space alone, or several spaces.
The implementation (as of 0.2 -- the master branch contains it already) has very solid support for JSON Pointer, and you can use it to address all possible paths in any JSON document.
But you first have to be able to find the corresponding document... And this is why the implementation today considers a URI such as a/b/c#/d/e as invalid: it just does not allow to locate a schema at all. Where is a/b/c?
Maybe this limitation will be lifted one day. But right now I just don't see the use for such URIs for schema identification! They just cannot identify anything if you don't know the context...
In short: support is there for all absolute URIs as long as you write a URIHandler for it, and JSON Pointer. Relative URIs which are not JSON Pointers are not supported by choice.