Skip to content
fge edited this page Nov 10, 2011 · 83 revisions

What is supported

All section 5 of the draft is supported, apart from the limitations mentioned below. This includes:

  • union types (in type as well as in disallow),
  • full dependencies,
  • "multiple extends" (ie, an array of schemas),
  • tuple/non-tuple validation for arrays,
  • $ref (but limited, see below) with loop detection,
  • formats (but limited, see below),
  • enums,
  • etc etc.

All in all, quite a complete implementation.

Limitations

Currently no support at all for...

ALL of section 6 is for now unsupported. Mainly because I don't understand much about it... I prefer to leave it untouched rather than implementing crap :p

Also, section 5 has three missing keywords:

  • default,
  • $schema,
  • id.

The default keyword is likely never to be supported since this API is about validation only (unless, that is, I decide on doing JSON Patch along with JSON Schema). As to the two other ones, they are important schema metadata which are in the plans. Just not yet, though.

$ref limitations and URI support in general

The draft says $ref can be any URI. The support is limited to HTTP-only URLs (yes, this means no HTTPS) and JSON paths, or a combination of both. For instance:

It will also work correctly for references within downloaded schemas (the refs will then be relative to that schema, not the schema you registered initially).

Also note that while the draft says any URI is supported in theory, this implementation will likely never accept all URIs. HTTP was a must. HTTPS will certainly get in. Some other schemes too. But consider the list below -- all are valid URIs:

  • irc://irc.freenode.net: this URI cannot even point to content...
  • a/b/c#/d/e: relative to what?

color and style format specifications

These format specifications aim to validate respectively a CSS 2.1 color and style. Unfortunately, the best candidate I have found for parsing CSS, jStyleParser, is not available on maven...

So, it is right now a mix of regexes and whatever information I could gather on CSS -- not much.

Limits on m{in,ax}Length and m{in,ax}Items

In a schema, these enforce resp. the minimum/maximum length of a string instance, and the minimum/maximum number of items of an array instance. The implementation won't accept any values for these which are greater than Integer.MAX_VALUE, that is... 2^31 - 1. You don't have JSON documents that big, do you?

Numeric instances (integers and numbers) are another story, see below.

What the draft doesn't say explicitly, but which is implicit, and is implemented

(for some definition of "implicit")

properties and patternProperties

If a property of an object instance being validated matches exactly a field defined in properties, then this property will be validated against the corresponding schema, so far so good.

However, nothing says that this property should match only this schema. In fact, in this case, the implementation also goes through patternProperties to see if the property happens to match a regex in there too (and see below about regexes). If and only if the property matches neither of them is additionalProperties considered (provided that it is not false, of course).

As an example, consider this schema:

{
    "type": "object",
    "properties": {
        "p1": { "type": "string" }
    },
    "patternProperties": {
        "p": { "minLength": 10 },
        "1": { "format": "host-name" }
    }
}

Now, if the instance to validate contains a property named p1:

  • it will of course have to match the schema defined by the corresponding entry in properties;
  • but it also matches regexes p and 1 (again, see below), so it will have in fact to match all three schemas: the one defined in properties and the two schemas in patternProperties.

divisibleBy, exclusiveM{in,ax}imum and m{in,ax}imum

Curiously, the draft doesn't say that, for instance, if exclusiveMinimum is present, then minimum MUST also be present. Neither does it say that the number in divisibleBy must not be 0. However, if you have a look at the schema , you see this:

// divisibleBy definition:
"divisibleBy" : {
	"type" : "number",
	"minimum" : 0,
	"exclusiveMinimum" : true,
	"default" : 1
},
// dependencies:
"dependencies" : {
	"exclusiveMinimum" : "minimum",
	"exclusiveMaximum" : "maximum"
},

Which means what it means. Those are therefore enforced at the syntax checking level.

Discussions about some fine points of the draft

Numeric instance validation

This applies to integer and number JSON nodes, and therefore to the minimum, maximum and divisibleBy keywords. And especially to the latter.

What happens here is that the JSON spec itself doesn't specify a range limit for numeric instances, and neither does the JSON Schema draft. These three keywords therefore theoretically apply to arbitrarily large numbers and/or numbers with an arbitrarily large precision. This is why all three validators above use Java's BigDecimal for validation.

On the TODO list is a way to make the validation process faster at least for numbers falling within Java's long primitive type (this is likely to represent a good number of use cases, so it is worth it). For decimal validation however, rounding has to be taken into account... And rounding precision means rounding errors, which means inaccuracies, which means wreaking havoc to the divisibleBy check in particular. I don't like inaccuracy, so, for decimal numbers, BigDecimal it is and it will likely remain so for the foreseeable future.

Regex support: ECMA 262, and the real definition of "matching"

The draft is quite clear that regexes should conform to ECMA 262. This rules out java.util.regex entirely (for instance, possessive quantifiers, like in a++, are legal in Java, but are not supported by ECMA 262). The only Java library (that I know of) in existence which is able to process ECMA 262 regexes is Rhino and its Javascript engine. This project uses it for that reason.

Also, even though the draft only implies it (and as the Javadoc points out in several places), please note that the definition of matching is the real one, not the "Java one": a regex can match anywhere in the input! So, remember this when writing your schemas -- if you want your regex to match the whole input, you must anchor it. This is valid for the pattern keyword, but also for keys in patternProperties. A JSON Schema implementation which doesn't act this way simply does not obey the draft!

Hostname and email validation

These are two of the format specifications defined by the draft (resp. host-name and email). The fine points:

  • nothing in the RFCs defining hostnames says they should be fully qualified;
  • nothing in the RFCs defining mail addresses says they should have a domain component.

This implementation strictly obeys the RFCs. The draft doesn't force hostnames or emails to be fully qualified/have a domain part. Therefore, as far as all relevant RFCs are concerned, foobar is a perfectly valid hostname -- and email address.

As a workaround, you can use a combination of the appropriate format with a minimalist pattern specification. For instance:

// Host names: at least one dot in it
{
    "format": "host-name",
    "pattern": "\\\\."
}
// Email address: at least a @, followed by everything but a point one or more times, followed by a dot
{
    "format": "email",
    "pattern": "@[^.]+\\\\."
}

This is enough, since the format validators themselves will ensure that the inputs are well formed.

utc-millisec validation

This format specification is said to be the number of milliseconds since epoch (that is, Jan 1st 1970 at 00:00 GMT). This is, in essence, a signed 32-bit integer times 1000. The implementation makes the choice to consider a numeric instance bound to this format specification invalid if:

  • it is negative, or
  • its result divided by 1000 is greater than 2^31 - 1.

This may, or may not be, a problem for you, YMMV. But if you actually plan to use such a formatted value in one of your programs, I think it is useful to enforce these.

Clone this wiki locally