Skip to content

jsonschema validation fails to resolve “grandchild” local file references #398

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
USSX-Hares opened this issue Apr 17, 2018 · 9 comments

Comments

@USSX-Hares
Copy link

Background:
I have multiple json schemas referring large same objects.
These objects are moved to a subdirectory.
In the example below, the following dependencies appear:

  1. main_schema => positive_integer
  2. main_schema => date
  3. date => positive_integer
  4. date => month

The jsonschema library fails to resolve only the last dependency, processing all other fine.

Project Tree

/
+-- code.py
+-- schemas /
|   +-- dependencies /
|   |   +-- date.json
|   |   +-- month.json
|   |   +-- positive_integer.json
|   +-- main_schema.json

code.py:

from jsonschema import validate
import json

contact = \
{
    "name": "William Johns",
    "age": 25,
    "birthDate": { "month": "apr", "day": 15 }
}

def main():
    schema = json.load(open("schemas/main_schema.json"))
    validate(contact, schema)

if (__name__ == '__main__'):
    main()

JSON Schemas

main_schema json

{
    "title": "MainSchema",
    
    "properties":
    {
        "name":      { "type": "string" },
        "age":       { "$ref": "file:schemas/dependencies/positive_integer.json" },
        "birthDate": { "$ref": "file:schemas/dependencies/date.json" }
    },
    "additionalProperties": false
}

date.json

{
    "title": "date",
    "type": "object",
    
    "properties":
    {
        "month": { "$ref": "file:month.json" },
        "day":   { "$ref": "file:positive_integer.json" }
    },
    "additionalProperties": false
}

month.json:

{
    "title": "month",
    "type": "string",
    "enum": [ "jan", "feb", "mar", "apr" ]
}

positive_integer.json:

{
    "title": "positiveInteger",
    "type": "integer",
    "minimum": 1
}

Problem

When I run this, program fails with the stacktrace:

"C:\Program Files (x86)\Python\3.6.0\python.exe" D:/Code/python/test/json_schema_testing/validator.py
Traceback (most recent call last):
  File "C:\Program Files (x86)\Python\3.6.0\lib\site-packages\jsonschema\validators.py", line 380, in resolve_from_url
    document = self.store[url]
  File "C:\Program Files (x86)\Python\3.6.0\lib\site-packages\jsonschema\_utils.py", line 23, in __getitem__
    return self.store[self.normalize(uri)]
KeyError: 'file:///schemas/dependencies/month.json'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Program Files (x86)\Python\3.6.0\lib\urllib\request.py", line 1474, in open_local_file
    stats = os.stat(localfile)
FileNotFoundError: [WinError 3] Системе не удается найти указанный путь: '\\schemas\\dependencies\\month.json'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Program Files (x86)\Python\3.6.0\lib\site-packages\jsonschema\validators.py", line 383, in resolve_from_url
    document = self.resolve_remote(url)
  File "C:\Program Files (x86)\Python\3.6.0\lib\site-packages\jsonschema\validators.py", line 474, in resolve_remote
    result = json.loads(urlopen(uri).read().decode("utf-8"))
  File "C:\Program Files (x86)\Python\3.6.0\lib\urllib\request.py", line 223, in urlopen
    return opener.open(url, data, timeout)
  File "C:\Program Files (x86)\Python\3.6.0\lib\urllib\request.py", line 526, in open
    response = self._open(req, data)
  File "C:\Program Files (x86)\Python\3.6.0\lib\urllib\request.py", line 544, in _open
    '_open', req)
  File "C:\Program Files (x86)\Python\3.6.0\lib\urllib\request.py", line 504, in _call_chain
    result = func(*args)
  File "C:\Program Files (x86)\Python\3.6.0\lib\urllib\request.py", line 1452, in file_open
    return self.open_local_file(req)
  File "C:\Program Files (x86)\Python\3.6.0\lib\urllib\request.py", line 1492, in open_local_file
    raise URLError(exp)
urllib.error.URLError: <urlopen error [WinError 3] Системе не удается найти указанный путь: '\\schemas\\dependencies\\month.json'>

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "D:/Code/python/test/json_schema_testing/validator.py", line 16, in <module>
    main()
  File "D:/Code/python/test/json_schema_testing/validator.py", line 13, in main
    validate(contact, schema)
  File "C:\Program Files (x86)\Python\3.6.0\lib\site-packages\jsonschema\validators.py", line 541, in validate
    cls(schema, *args, **kwargs).validate(instance)
  File "C:\Program Files (x86)\Python\3.6.0\lib\site-packages\jsonschema\validators.py", line 129, in validate
    for error in self.iter_errors(*args, **kwargs):
  File "C:\Program Files (x86)\Python\3.6.0\lib\site-packages\jsonschema\validators.py", line 105, in iter_errors
    for error in errors:
  File "C:\Program Files (x86)\Python\3.6.0\lib\site-packages\jsonschema\_validators.py", line 304, in properties_draft4
    schema_path=property,
  File "C:\Program Files (x86)\Python\3.6.0\lib\site-packages\jsonschema\validators.py", line 121, in descend
    for error in self.iter_errors(instance, schema):
  File "C:\Program Files (x86)\Python\3.6.0\lib\site-packages\jsonschema\validators.py", line 105, in iter_errors
    for error in errors:
  File "C:\Program Files (x86)\Python\3.6.0\lib\site-packages\jsonschema\_validators.py", line 216, in ref
    for error in validator.descend(instance, resolved):
  File "C:\Program Files (x86)\Python\3.6.0\lib\site-packages\jsonschema\validators.py", line 121, in descend
    for error in self.iter_errors(instance, schema):
  File "C:\Program Files (x86)\Python\3.6.0\lib\site-packages\jsonschema\validators.py", line 105, in iter_errors
    for error in errors:
  File "C:\Program Files (x86)\Python\3.6.0\lib\site-packages\jsonschema\_validators.py", line 304, in properties_draft4
    schema_path=property,
  File "C:\Program Files (x86)\Python\3.6.0\lib\site-packages\jsonschema\validators.py", line 121, in descend
    for error in self.iter_errors(instance, schema):
  File "C:\Program Files (x86)\Python\3.6.0\lib\site-packages\jsonschema\validators.py", line 105, in iter_errors
    for error in errors:
  File "C:\Program Files (x86)\Python\3.6.0\lib\site-packages\jsonschema\_validators.py", line 212, in ref
    scope, resolved = validator.resolver.resolve(ref)
  File "C:\Program Files (x86)\Python\3.6.0\lib\site-packages\jsonschema\validators.py", line 375, in resolve
    return url, self._remote_cache(url)
  File "C:\Program Files (x86)\Python\3.6.0\lib\site-packages\jsonschema\validators.py", line 385, in resolve_from_url
    raise RefResolutionError(exc)
jsonschema.exceptions.RefResolutionError: <urlopen error [WinError 3] Системе не удается найти указанный путь: '\\schemas\\dependencies\\month.json'>
(eng: System could not find such file: ...)

Process finished with exit code 1

As I investigated, the 'grandchild' dependencies could be resolved only if they were preloaded earlier.
So, if remove the "date => month" dependency, or forcibly "preload" it from the rool level, all would work fine.

Workaround

Modify main_schema.json to be something like this:

{
    "title": "MainSchema",

    "not":
    {
        "comment": "This is preload of 'grandchild' dependencies. It is required due to the jsonschema lib issues.",
        "anyOf":
        [
            { "$ref": "file:schemas/dependencies/month.json" },
            { "$ref": "file:schemas/dependencies/date.json" },
            { "$ref": "file:schemas/dependencies/positive_integer.json" }
        ]
    },


    "properties":
    {
        "name":      { "type": "string" },
        "age":       { "$ref": "file:schemas/dependencies/positive_integer.json" },
        "birthDate": { "$ref": "file:schemas/dependencies/date.json" }
    },
    "additionalProperties": false
}

If do so, validation would be successful.
However, I really do not like this workaround.
If you have any ideas how to fix it, please, tell me.

System Info:

  • Windows 7
  • Python 3.6.0
  • jsonschema 2.6.0
@awwright
Copy link

file:schemas/dependencies/positive_integer.json isn't a valid URI Reference, is that something that normally works?
If the file you're calling from is already a file: URI, you don't need to specify the protocol. Just supply a relative path, same as in HTML or CSS.

@USSX-Hares
Copy link
Author

@awwright this does not work. In any combinations.

file:schemas/dependencies/positive_integer.json isn't a valid URI Reference, is that something that normally works?

No. This is happening only because someone did not think about developers using json schemas locally.
However, the first file: reference is resolved normally, as I described above.

@awwright
Copy link

@USSX-Hares I'm not sure the specifics of this library, but you may need to import the files you're trying to reference beforehand. Check out https://python-jsonschema.readthedocs.io/en/latest/references/#jsonschema.RefResolver and particularly the "store" argument.

Using file: without a // following definitely isn't valid, though. That'll get interpreted as URI and not a relative reference because it looks like it has a scheme.

@USSX-Hares
Copy link
Author

USSX-Hares commented Apr 27, 2018

@awwright I tried:
file://path/to/file
path/to/file
file:path/to/file
//file/path/to/file

All of them have the same result

@handrews
Copy link

handrews commented Apr 28, 2018

@USSX-Hares The file URI scheme syntax is a little confusing. What you want is:

file:///path/to/file

Note the three consecutive forward slashes!

What's going on here is that file is the actual scheme, :// is the separator, and then the third / is the beginning of your absolute path (assuming that /path/to/file is how you would reference it locally on on unix-ish filesystem).

Since files are local, there is no "authority" (hostname) component, which usually comes between the :// and the first / in the path, e.g. http://hostname.goes.here/path/to/file.

@USSX-Hares
Copy link
Author

USSX-Hares commented May 2, 2018

@handrews you speak about the absolute path.
I need relative path only. Furthermore, this is cross-platform.

And, did any of you at least try before suggesting?
The source code of my schemas (similar to the example above): https://github.com/USSX-Hares/hammerhal/tree/dev/schemas

@USSX-Hares
Copy link
Author

USSX-Hares commented May 2, 2018

@handrews @awwright @all_other_who_cannot_read_carefully
This issue is not about how i've written my reference to a file.
This issue is about library's inability to resolve 'grandchild' dependencies unless they were referenced earlier in the root schema.

@Julian
Copy link
Member

Julian commented May 2, 2018

@USSX-Hares your attitude isn't welcome here. Fix it, and consider apologizing to the folks above who were trying to help. You also likely should apologize to me for your sass, though I care less about that.

Looking at other open issues to see whether this is the same one is another thing you likely should have done, or feel free to send a patch fixing whatever issue you're encountering if you think you're encountering a bug.

It's unlikely I'll look at this considering the number of open $ref bugs and my desire to have it rewritten to fix them.

You're owed nothing, please act like it.

@Julian Julian closed this as completed May 2, 2018
@USSX-Hares
Copy link
Author

USSX-Hares commented Feb 27, 2019

This could be fixed with a little patch to the RefResolver, validate() calling, and the schemas $ref values.

How to fix

  • Create RefResolver child class that could check if the file is stored locally.
  • Instantiate RefResolver child class with base_uri equal to the absolute path to the schema's directory with the file:// prefix.
  • Schemas should use relative path without any prefices.

Changed project files

ref_resolver.py:

import os.path
import json
from jsonschema import RefResolver

class ExtendedRefResolver(RefResolver):
    def resolve_remote(self, uri):
        print(f"Resolving URI: '{uri}'")
        
        path = None
        if (uri.startswith('file:')):
            path = uri[len('file:'):]
            if (path.startswith('//')):
                path = path[len('//'):]
        
        elif (os.path.isfile(uri)):
            path = uri
        
        if (path is not None):
            return self.resolve_local(path)
        else:
            return super().resolve_remote(uri)
    
    def resolve_local(self, path: str):
        with open(path) as file:
            schema = json.load(file)
        
        if (self.cache_remote):
            self.store[path] = schema
        return schema

code.py:

import os.path
import json

from jsonschema import validate
from ref_resolver import ExtendedRefResolver

contact = \
{
    "name": "William Johns",
    "age": 25,
    "birthDate": { "month": "apr", "day": 15 }
}

def main():
    schema_path = "schemas/main_schema.json"
    schema = json.load(open(schema_path))
    validate(contact, schema, resolver=ExtendedRefResolver(base_uri='file://' + os.path.dirname(os.path.abspath(schema_path)), referrer=schema))

if (__name__ == '__main__'):
    main()

main_schema.json:

{
    "title": "MainSchema",

    "properties":
    {
        "name":      { "type": "string" },
        "age":       { "$ref": "dependencies/positive_integer.json" },
        "birthDate": { "$ref": "dependencies/date.json" }
    },
    "additionalProperties": false
}

dependencies/date.json:

{
    "title": "date",
    "type": "object",

    "properties":
    {
        "month": { "$ref": "month.json" },
        "day":   { "$ref": "positive_integer.json" }
    },
    "additionalProperties": false
}

USSX-Hares added a commit to USSX-Hares/jsonschema that referenced this issue Feb 28, 2019
Julian added a commit that referenced this issue Jun 11, 2020
57001d26b [294] Add tests for "additionalProperties" and "additionalItems"
27192102c Merge pull request #398 from ChALkeR/chalker/no-dec-ipv4
f5f481a63 Decimal IPs are also not dotted-quad

git-subtree-dir: json
git-subtree-split: 57001d26bf50598fe16ac3062aba5ddbd650a73c
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants