Skip to content
fge edited this page Jan 8, 2012 · 83 revisions

What is supported

All section 5 of the draft is supported, apart from the limitations mentioned below. This includes:

  • union types (in type as well as in disallow),
  • full dependencies,
  • "multiple extends" (ie, an array of schemas),
  • tuple/non-tuple validation for arrays,
  • $ref (however, see below) with loop detection,
  • formats (but limited, see below),
  • enums,
  • etc etc.

All in all, quite a complete implementation.

Limitations

Currently no support at all for...

The default keyword is likely never to be supported since this API is about validation only (unless, that is, I decide on doing JSON Patch along with JSON Schema).

Native URI support and $ref

The draft says $ref can be any URI. You can register URI handlers for any scheme. The only URI scheme supported natively is http. These are supported natively:

JSON Pointer-only references will also work correctly for downloaded schemas (the refs will then be relative to that schema, not the schema you registered initially). But not all relative URIs (ie, URIs without a scheme) are supported, see below.

color and style format specifications

These format specifications aim to validate respectively a CSS 2.1 color and style. Unfortunately, the best candidate I have found for parsing CSS, jStyleParser, is not available on maven...

So, it is right now a mix of regexes and whatever information I could gather on CSS -- not much.

UPDATE: on master, CSS color validation is now complete. Style, not yet. An issue is that the format keyword is scheduled for removal (reappearance?), which means it may have to be split from the core.

Limits on m{in,ax}Length and m{in,ax}Items

In a schema, these enforce resp. the minimum/maximum length of a string instance, and the minimum/maximum number of items of an array instance. The implementation won't accept any values for these which are greater than Integer.MAX_VALUE, that is... 2^31 - 1. You don't have JSON documents that big, do you? Well, OK, some modern NoSQL databases may have JSON data as large, if not even larger.

Numeric instances (integers and numbers) are another story, see below.

What the draft doesn't say explicitly, but which is implicit, and is implemented

(for some definition of "implicit")

properties and patternProperties

If a property of an object instance being validated matches exactly a field defined in properties, then this property will be validated against the corresponding schema, so far so good.

However, nothing says that this property should match only this schema. In fact, in this case, the implementation also goes through patternProperties to see if the property happens to match a regex in there too (and see below about regexes). If and only if the property matches neither of them is additionalProperties considered (provided that it is not false, of course).

As an example, consider this schema:

{
    "type": "object",
    "properties": {
        "p1": { "type": "string" }
    },
    "patternProperties": {
        "p": { "minLength": 10 },
        "1": { "format": "host-name" }
    }
}

Now, if the instance to validate contains a property named p1:

  • it will of course have to match the schema defined by the corresponding entry in properties;
  • but it also matches regexes p and 1 (again, see below), so it will have in fact to match all three schemas: the one defined in properties and the two schemas in patternProperties.

divisibleBy, exclusiveM{in,ax}imum and m{in,ax}imum

Curiously, the draft doesn't say that, for instance, if exclusiveMinimum is present, then minimum MUST also be present. Neither does it say that the number in divisibleBy must not be 0. However, if you have a look at the schema , you see this:

// divisibleBy definition:
"divisibleBy" : {
	"type" : "number",
	"minimum" : 0,
	"exclusiveMinimum" : true,
	"default" : 1
},
// dependencies:
"dependencies" : {
	"exclusiveMinimum" : "minimum",
	"exclusiveMaximum" : "maximum"
},

Which means what it means. Those are therefore enforced at the syntax checking level.

$ref syntax checking

The draft doesn't say it, and in fact, the schema itself doesn't enforce it, but it is obvious that $ref doesn't make any sense in combination with other keywords. This implementation makes the choice of considering schemas with $ref and at least another keyword in it as invalid, with one exception for draft v3 schemas. Consider this:

    "properties": {
        "p": {
            "$ref": "something",
            "required": true
        }
    }

This is the only exception to the rule.

Discussions about some fine points of the draft

Numeric instance validation

This applies to integer and number JSON nodes, and therefore to the minimum, maximum and divisibleBy keywords. And especially to the latter.

What happens here is that the JSON spec itself doesn't specify a range limit for numeric instances, and neither does the JSON Schema draft. These three keywords therefore theoretically apply to arbitrarily large numbers and/or numbers with an arbitrarily large precision. Although Javascript limits itself to 64-bit IEEE 754 floating point numbers, and although JSON has Javascript in its acronym (recall: JSON means JavaScript Object Notation), it doesn't mean JSON is used only with JavaScript. Consider MongoDB, for example.

Therefore, the implementation chooses to use Java's BigDecimal for numeric instance validation, and falls back to long if both the schema keyword value and the instance value fit into this type. For decimal validation however, rounding has to be taken into account... And rounding means rounding errors, which means inaccuracies, which means wreaking havoc to the divisibleBy check in particular. I don't like inaccuracy, so, for decimal numbers, BigDecimal it is and it will likely remain so for the foreseeable future.

Regex support: ECMA 262, and the real definition of "matching"

The draft is quite clear that regexes should conform to ECMA 262. This rules out java.util.regex entirely (for instance, possessive quantifiers, like in a++, are legal in Java, but are not supported by ECMA 262). The only Java library (that I know of) in existence which is able to process ECMA 262 regexes is Rhino and its Javascript engine. This project uses it for that very reason (and, again, I don't like inaccuracy).

Also, even though the draft only implies it (and as the Javadoc points out in several places), please note that the definition of matching is the real one, not the "Java one": a regex can match anywhere in the input! So, remember this when writing your schemas -- if you want your regex to match the whole input, you must anchor it. This is valid for the pattern keyword, but also for keys in patternProperties. A JSON Schema implementation which doesn't act this way simply does not obey the draft!

Hostname and email validation

These are two of the format specifications defined by the draft (resp. host-name and email). The fine points:

  • nothing in the RFCs defining hostnames says they should be fully qualified;
  • nothing in the RFCs defining mail addresses says they should have a domain component.

This implementation strictly obeys the RFCs. The draft doesn't force hostnames or emails to be fully qualified/have a domain part. Therefore, as far as all relevant RFCs are concerned, foobar is a perfectly valid hostname -- and email address.

As a workaround, you can use a combination of the appropriate format with a minimalist pattern specification. For instance:

// Host names: at least one dot in it
{
    "format": "host-name",
    "pattern": "\\\\."
}
// Email address: at least a @, followed by everything but a point one or more times, followed by a dot
{
    "format": "email",
    "pattern": "@[^.]+\\\\."
}

This is enough, since the format validators themselves will ensure that the inputs are well formed.

utc-millisec validation

This format specification is said to be the number of milliseconds since epoch (that is, Jan 1st 1970 at 00:00 GMT). This is, in essence, a signed 32-bit integer times 1000. The implementation makes the choice to consider a numeric instance bound to this format specification invalid if:

  • it is negative, or
  • its result divided by 1000 is greater than 2^31 - 1.

This may, or may not be, a problem for you, YMMV. But if you actually plan to use such a formatted value in one of your programs, I think it is useful to enforce these.

JSON Pointer aims at describing a unique way to address specific paths within a JSON document. It has been very carefully written to use fragments in a smart way. The next JSON Schema draft will require JSON Pointer support. For instance, this URI:

http://json-schema.org/draft-03/schema#

is to be separated in two parts: the non fragment part (http://json-schema.org/draft-03/schema) and the fragment part (#). And # is the JSON Pointer pointing to the root of the document. If the URI is:

http://some.other.host/xxxx/theschema#/schema1

then it means that the JSON schema to use is the JSON document at path #/schema1 starting from the JSON document located at http://some.other.host/xxxx/theschema.

The best part: it can actually address all of a JSON document. For instance, the following property names are perfectly valid in a JSON document, and the JSON Pointer spec allows to address all of them:

  • an empty property name (yes, it is valid, and it also means that # and #/ are NOT the same path);
  • /, % and #;
  • . and .. (this is why JSON Pointers are never relative);
  • a space alone, or several spaces.

The implementation has very solid support for JSON Pointer, and you can use it to address all possible paths in any JSON document. (although it is really only used to address pointers into schemas currently, the implementation is also used to display pointers in validation reports)

Relative URIs

But you first have to be able to find the corresponding document... And this is why the implementation today considers a URI such as a/b/c#/d/e as invalid: it just does not allow to locate a schema at all. Where is a/b/c?

Maybe this limitation will be lifted one day. But right now I just don't see the use for such URIs for schema identification! They just cannot identify anything if you don't know the context...

In short: support is there for all absolute URIs as long as you write a URIHandler for it, and JSON Pointer. Relative URIs which are not JSON Pointers are not supported.

Clone this wiki locally