Skip to content

Eterna2/fix relative ref #595

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

eterna2
Copy link

@eterna2 eterna2 commented Aug 16, 2019

Fix #570

  • Support relative local fs ref
  • Support relative remote ref
  • Added corresponding unittests

@codecov
Copy link

codecov bot commented Aug 16, 2019

Codecov Report

Merging #595 into master will decrease coverage by 1.35%.
The diff coverage is 80.55%.

@@            Coverage Diff             @@
##           master     #595      +/-   ##
==========================================
- Coverage   95.64%   94.29%   -1.36%     
==========================================
  Files          20       20              
  Lines        2436     2472      +36     
  Branches      308      309       +1     
==========================================
+ Hits         2330     2331       +1     
- Misses         93      122      +29     
- Partials       13       19       +6

@codecov
Copy link

codecov bot commented Aug 16, 2019

Codecov Report

Merging #595 into master will decrease coverage by 0.3%.
The diff coverage is 80.55%.

@@            Coverage Diff             @@
##           master     #595      +/-   ##
==========================================
- Coverage   95.64%   95.34%   -0.31%     
==========================================
  Files          20       20              
  Lines        2436     2472      +36     
  Branches      308      309       +1     
==========================================
+ Hits         2330     2357      +27     
- Misses         93      101       +8     
- Partials       13       14       +1

@eterna2
Copy link
Author

eterna2 commented Aug 16, 2019

can't figure out why coverage for diff is not 100%. My 2nd unit test shld have covered it.

@Julian
Copy link
Member

Julian commented Aug 19, 2019

Thanks a lot for trying to tackle this!

Does this work properly with requests?

If so, I'm really more inclined at this point to just drop support for urllib entirely and make requests fully required -- the conditional dependency is... like 7 years old at this point, back when it wasn't necessarily the case that requests was a good idea to depend on.

@eterna2
Copy link
Author

eterna2 commented Aug 19, 2019

@Julian

Actually, my implementation does not use request (but there is no issue using request either).

It is more of a fallback logic (cuz I do not want to change too much of ur code), it will only attempt to try:

  • resolve as a local relative path (concat base_uri with ref_uri and try json.load).
  • otherwise resolve as remote relative path (strip spec filename to get base url and resolve path with relative ref_uri with respect to the base url (I use urllib/urlopen here in this case) - but i still need urlsplit and urlunsplit to interpret the url so that I can find the base_url. But frankly it is not difficult to just implement that part (instead of using urllib).
 # need to reimplement this if u r deprecating urllib
scheme, network, path, qs, frag = urlsplit(self.base_uri) 

path = "/".join(path.split("/")[0:-1])  # base path

 # need to reimplement this if u r deprecating urllib
base_url = urlunsplit((scheme, network, path, qs, frag)) 

# this part can be easily replaced with request instead
with urlopen(urljoin(base_url, uri)) as url:
  result = json.loads(url.read().decode("utf-8"))

I made this PR with minimal change to ur codebase.

Just my 2 cent, if u are going to deprecate urllib and do some major refractoring, can make use of the existing custom_handler hook - i.e. create a http/https handler, file handler.

Then we have a simple function to try the handlers with some fallback logics, because how I am doing it in this PR is to try relative_local if the default handler fails, then try relative_remote if relative_local fails. But this is a nested try/except, which may not be so nice.

Not working code, but some idea, what I meant.

handlers_fallback = [std_handler, relative_file_handler, relative_remote_handler]

def get_std_handlers(uri, custom_handlers={}):
  if uri[0] == "#":
    return local_ref_handler
   schema = get_schema(uri)
   handlers = {**default_handlers, **custom_handlers}
   return handlers .get(schema, file_handler)

def std_handler(uri, base_uri=""):
  handler = get_std_handlers(uri)
  return handler(uri, base_uri=base_uri)

def request_handler(uri, base_uri=""):
  return request.get(uri).json()

def local_ref_handler(uri, base_uri=""):
  return definitions[_get_definition_key(uri)]

def file_handler(uri, base_uri=""):
  file_path = strip_protocol_if_any(uri)
  with open(file_path) as fin:  
    return json.load(fin)
 
def relative_file_handler(uri, base_uri=""):
  ref_path = os.path.join(strip_protocol_if_any(base_uri), strip_protocol_if_any(uri))
  return file_handler(ref_path)

def relative_remote_handler(uri, base_uri=""):
  base_url = get_base_url(base_uri)
  url = "%s/%s" % (base_url, strip_protocol_if_any(uri))
  return request_handler(url)
  
def handle_ref(handlers, handler_kwargs, index=0):
  if index >= len(handlers):
    raise Exception("Unable to resolve ref")
  try:
    return handlers[index](**handler_kwargs)
  except Exception:
    return handle_ref(handlers, handler_kwargs, index=index+1)

@Julian
Copy link
Member

Julian commented Aug 19, 2019

Actually, my implementation does not use request (but there is no issue using request either).

Sorry so are you saying then that the issue doesn't exist when requests is installed?

I'm not sure I follow the rest of your comment, since:

Just my 2 cent, if u are going to deprecate urllib and do some major refractoring, can make use of the existing custom_handler hook - i.e. create a http/https handler, file handler.

That wouldn't be any major refactoring, it'd literally just delete the entire else block you modified here, but otherwise everything stays exactly the same -- so basically what I'm asking is "in that case, is whatever bug you're trying to fix still present"?

@Julian
Copy link
Member

Julian commented Aug 19, 2019

I should say I'm separately not very keen to do any kind of fallback logic / chaining -- it's easy to tell what's a remote ref and what's a local one, so we shouldn't really need any try/excepting or if/elsing, so will need to try to understand what the motivation is there.

@Julian
Copy link
Member

Julian commented Aug 19, 2019

This also looks pretty similar to #531, where I seem to have asked the exact same question :)

@eterna2
Copy link
Author

eterna2 commented Aug 19, 2019

@Julian

This is different from #531. #531 is asking to support file://. #570 is asking to support relative reference (i.e. if protocol is not specified, it means the referenced schema is either relative to the local file system or relative to the remote url of the base schema).

The use case is mostly when u do not want to publish ur schema (i.e. http/https) and only wish to retrieve it from local fs (i.e. u want to reuse some of the definitions in other json schema).

urllib supports file:// but this is not ideal, as it requires absolute path which is not desirable as different machine will have different paths.

So the feature request is:

if $ref = "some_schema.json#some_definition", RefResolver should load from local fs relative to where the original schema (i.e. $schema) resides in the fs. If $schema is in some remote endpoint, it should similar load relative from that remote endpoint.

The current implementation does not support this:

  • request only target http/https (no fs or file:// at all)

And there are multiple ways to go about doing this:

  • u can check $ref to see if http/https is specified and use request.
  • unless u r dropping file:// support (it is currently supported because of urlopen), you will need to do a check for this too, and implement an equivalent without using urlopen.
  • then u need to do a check if the original json schema is http/https or a local fs, then implement a local relative json.load, or a request.get(base_url+url).json() respectively.

or u can just do a fallack, which i did in this PR. The relative ref will never hit ur request if statement because it is relative ref (e.g. some_spec.json#some_def instead of http://foo.bar/some_spec.json#some_def).

@Julian
Copy link
Member

Julian commented Aug 22, 2019

Got it thanks -- that makes sense, but is this really that hard to do currently? If it's really just for development, just invent your own scheme, and include a handler for it?

I.e.:

import json
from jsonschema import RefResolver, validate


def relative(url):
    _, _, path = url.partition("://")
    with open(path) as file:
        return json.load(file)


with open("schema.json") as file:
    schema = json.load(file)
    resolver = RefResolver.from_schema(schema, handlers={"relative-file": relative})
    validate("", schema, resolver=resolver)

@eterna2
Copy link
Author

eterna2 commented Aug 23, 2019

This is a PR. U can see my fix here. It isn't hard, and it is officially in the spec (relative ref I mean).

Your approach is doable but not elegant, as in it feels hackish. Cuz the main use case here is validation as part of our unit test/ci process. Would be hackish if we write a custom handler with a custom protocol for all our projects just to support relative ref.

Regardless of urllib support of otherwise, the fix is relatively simple, it is just falling back to local relative fs -> remote relative fs. All about path manipulation, not much on io.

@Julian
Copy link
Member

Julian commented Aug 23, 2019

The behavior in the PR is definitely not in the spec, so I'm trying to narrow down exactly what/which issue is in the spec that you're trying to fix here.

The spec definitely does not say "every time you see a ref, try cascading through first resolving it absolutely, then assume it's a local path, ..."

It has rules about what you assume the base URI is for schemas that forgot to declare it, so fully believe there's something to fix there, but yeah that's what I was referring to above about not having fallback behavior.

@Julian
Copy link
Member

Julian commented Aug 24, 2019

The other thing I meant to mention here is whether you're really asking about loading schemas that don't specify their base URI and then not telling jsonschema what that base URI is that you loaded them from. E.g.:

import os
import json
from jsonschema import RefResolver, validate


with open("schema.json") as file:
    schema = json.load(file)
    resolver = RefResolver(
        referrer=schema,
        base_uri="file://" + os.getcwd() + "/",
    )
    validate("", schema, resolver=resolver)

if you want the base URI to resolve relative to the current working directory, or some other one.

@eterna2
Copy link
Author

eterna2 commented Aug 24, 2019

Yeah this is what I meant and needed. Thanks!

Will this work recursively? As in this schema loads another schema which loads another schema? All the schemas are in the same folder but does not have a $id defined.

Because my usecase is a pretty complex set of config files that I need to validate for multiple projects.

@Julian
Copy link
Member

Julian commented Aug 24, 2019 via email

@Julian Julian closed this Aug 24, 2019
Julian added a commit that referenced this pull request Nov 11, 2022
0fa89d2ab Merge pull request #595 from json-schema-org/ether/cousins-reverse
2b1f711d1 Merge pull request #598 from flaksp/patch-1
104f9f98f Merge pull request #599 from tristanpenman/main
209ba6404 Add Valijson to 'Who Uses the Test Suite'
47bf107c5 Fix typo in comment of "order of evaluation: $id and $anchor and $ref" suite
dee0edadc Merge pull request #597 from json-schema-org/contributing
f54f00a29 Explicitly mention that one approval doesn't override other concerns.
e17668ec3 Add additional notes about writing good tests.
a6520db44 Add in a contributors document outlining the repo's process.                                                                                                                julian@Airm
6ce9bc4ce Merge pull request #596 from json-schema-org/steering
8defd7588 Add a note about the future TSC.
b9bb50bcd also test allOf schemas presented in the reverse order

git-subtree-dir: json
git-subtree-split: 0fa89d2ab6df07d7afabace3a07362c5eedbd70f
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

"RefResolutionError" exception raised when resolving "$ref"
2 participants