Skip to content

Appendix on vocabulary plugin architecture? #995

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
handrews opened this issue Sep 20, 2020 · 7 comments
Closed

Appendix on vocabulary plugin architecture? #995

handrews opened this issue Sep 20, 2020 · 7 comments
Labels

Comments

@handrews
Copy link
Contributor

Recent discussions have made it clear that we need some consensus on the interface to / capabilities of vocabulary handling modules. Perhaps an appendix? Or maybe on the web site?

For me, the fundamentals are....

  • Interactions among are specified in terms of annotations
    • annotation names match keyword names
    • there is no way to know from what vocabulary a keyword / its annotation(s) derived
    • so far, only adjacent keywords, or keywords in subschemas of adjacent in-place applicators, cause interactions
    • adjacent == dynamic schema paths are identical (or prefix is identical for subschema of adjacent)
    • via in-place applicators == instance location is identical
  • vocabularies are not, at the code level, aware of each other
  • the default processing model should be to pass each keyword to each module
    • gather all annotations for each keyword from each module
    • AND all validation results for each keyword from each module
  • when implementing an applicator, a module needs to be able to call back to the main implementation to evaluate the subschema(s)

It may be necessary to do one or more of the following in order to get keywords evaluated in the right order:

  • have each module indicate what keyword(s) it handles
  • have each module indicate which, if any, keywords are in-place applicators
  • have each module indicate which, if any, keywords rely on which annotation names (which are also keyword names)
  • have each module indicate which, if any, keywords rely on in-place applicators
@handrews
Copy link
Contributor Author

Importantly, all of this can be done at the keyword dependency level. Additional optimizations are possible and of course would not be forbidden. But fundamentally, you can do this by loading keyword dependencies (presumably when the vocab is loaded) and then keeping track of which keywords have been evaluated. A variation of that is noting which keywords depend on in-place applicators, and which keywords are in-place applicators. With that info, you can:

  • encounter a keyword, check its dependencies
  • if it depends on in-place applicators, evaluate all of those first
  • if it depends on other adjacent keywords, evaluate those next
  • evaluate the original keyword

In all steps, "evaluating" a keyword means passing it to each vocabulary handler. As an optional step, one could track which handlers handle which keywords and then you don't have to call every handler for every keyword.

The key thing is that it's keyword by keyword, not vocabulary by vocabulary. Particularly when it comes to in-place applicators, requiring things to be done at the vocabulary granularity will create deadlocks.

@karenetheridge
Copy link
Member

As a guide for developing additional/replacement vocabularies, a separate document is fine. But some of these items will need to be specified for the existing vocabularies as well, in which case they should be in the specification itself.

Particularly when it comes to in-place applicators, requiring things to be done at the vocabulary granularity will create deadlocks.

How so?

@handrews
Copy link
Contributor Author

It depends on exactly how things are arranged and I do not have the energy to sort it out. Maybe it's possible, but that's not the point. The point is to make things at the keyword granularity because that's conceptually simpler and more flexible, and robust to changes in vocabulary such as the one splitting the unevaluated* keywords out. And I would be very grateful if you would consider the proposal before challenging every detail.

@handrews
Copy link
Contributor Author

I prefer the proposal in #996 but leaving this open as a fallback because either #996 gets accepted quickly or we're not doing it and we can just hand-wave stuff in prose.

@jdesrosiers
Copy link
Member

I like the idea of including the principle design decisions behind vocabularies, but I think we should be careful to not go too far and prescribe implementation choices. I'm not saying that's what you're doing, but the line can be fuzzy sometimes and we might end up imposing more than we intend if we aren't careful.

Here are the principle design decision I've made in my implementation. I take this perspective not to make it about me or my implementation, but I figure if we have two sets "fundamental" principles, we're more likely to identify what's truly fundamental and what can be a degree of freedom.

Principle Design Decisions for Hyperjump JSON Schema Vocabularies

  • Keywords
    • are identified by an absolute URI
    • are stateless and can be evaluated in any order
      • may not use the validation the validation results of other keywords
        • (Yes, unevaluatedProperties/unevaluatedItems violate this rule)
      • may use the values of adjacent or sub-schema keywords
      • can validate sub-schemas and use their boolean result
  • Vocabularies
    • are identified by an absolute URI
    • are a mapping of keyword names to Keyword IDs
    • when multiple vocabularies are included, their keyword name/id maps are merged
      • once the vocabularies are loaded, there is no way to know which vocabulary each keyword came from
      • behavior in the case of keyword name collisions is undefined (in practice, last vocabulary loaded wins)
  • Validation
    • is a keyword implementation just like any other
    • For each keyword in the schema,
      • lookup the keyword ID using the keyword name
      • Get the keyword implementation using keyword ID
      • Pass the schema and the instance to the keyword implementation to get a boolean result
    • Return true if all keyword report valid or false otherwise.

There's plenty of overlap, so that's great! For example

  • Keywords aren't aware of vocabularies
  • Keyword interactions are limited and discouraged, with some allowances for exceptional cases
  • Keywords need to be able to validate sub-schemas

I think our main difference is that I'm much more strict about statelessness of keywords. Effectively, that means I don't use annotation to handle keyword interactions. Maybe I just made an extreme design choice, but it's worth thinking about whether annotations are truly a fundamental concept considering I was able to implement everything without them.

The other thing that doesn't appear in my list is keyword taxonomy. The taxonomy is nice to give us names to talk about things, but I haven't found a use for them in code. This may be another thing that is not fundamental other than as an ontology to describe things.

the default processing model should be to pass each keyword to each module

This is the other place we differ. I may be misunderstanding. If two vocabularies include the same keyword name, this algorithm would require both to be valid for the keyword to be valid. This seems to be defining a behavior for conflicting keyword names, but the spec considers that behavior undefined.

  • adjacent == dynamic schema paths are identical (or prefix is identical for subschema of adjacent)
  • via in-place applicators == instance location is identical

I'm not sure what these mean.

@handrews
Copy link
Contributor Author

@jdesrosiers an issue about a stopgap to get us through the next draft is not a place to open a discussion on replacing the entire theoretical model that I've spent the last several years developing.

Y'all are truly welcome to throw anything and everything I've one out after the draft that goes out with OAS 3.1 final. I had mostly stepped back but then @Relequestual told me OAS 3.1 was imminent and we needed to wrap up the draft ASAP so I am doing my best to handle all of the loose ends I've left.

But, in the middle of the world going to shit, and taking my already shaky mental health with it, I am not in any shape to deal with this kind of "hey throw everything out and reconsider it" right now.

Again, feel free to do so after I'm done with this and more or less off the project. I truly will not feel hurt over it: I picked up the ball and carried it a certain amount. The work is not done, and the final product may lie in a new direction.

But. Not. Now.

I'm closing this, I cannot deal with this topic right now and don't want it to progress at all- I apologize for not addressing your work here, @jdesrosiers but now is really, really, really not the time. Let me finish this and get out of the way, please.

@json-schema-org json-schema-org locked as off-topic and limited conversation to collaborators Sep 27, 2020
@handrews
Copy link
Contributor Author

(those with permission to keep commenting despite the lock, please don't- I will make a comment on Slack momentarily about both the technical and personal issues mentioned here)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

3 participants