Skip to content

"missing" or "defaultProperties" annotation keyword #867

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
awwright opened this issue Mar 1, 2020 · 21 comments
Open

"missing" or "defaultProperties" annotation keyword #867

awwright opened this issue Mar 1, 2020 · 21 comments
Assignees
Labels
proposal Initial discussion of a new idea. A project will be created once a proposal document is created.

Comments

@awwright
Copy link
Member

awwright commented Mar 1, 2020

The "default" keyword is one of the most misused & abused annotation keywords due to consequences of how JSON Schema works.

It is frequently assumed that if a property in an instance document is missing, the "default" keyword lets you create that property and fill in that value as if it has the same behavior.

But this is not actually the case. The biggest reason is the "default" keyword does not produce an annotation unless the property exists in the instance in the first place... defeating the point.

The "default" keyword is mostly useful for user interface and IDE work, where a user indicates they want to create a value, and so the "default" keyword can provide a sensible initial default in these cases. For example:

  • A user creates a new record in a document database (like a MongoDB collection). The interface creates an instance of the schema, reading the "default" keyword to provide a value; instead of creating a blank document (which is invalid JSON).

  • A user in an IDE is typing { "name": and then tab-completes in the default value, an empty string (as opposed to e.g. a number, or another object).

However, all the time, users seem to think you can substitute in the "default" value for a missing one:

This also seemed sensible to me until I realized that "default" doesn't produce an annotation if the instance is absent.

I'm proposing a keyword that does mean exactly this: it means "if the instance is an object, and it is missing any of the given properties, the behavior will be the same as if it were defined with the specified value".

Example schema:

{
  "missing": {
    "port": 80
  },
  "properties": {
    "port": { "type":"integer" }
  }
}

This would allow implementations to infer that this:

{ }

will behave the same as this:

{ "port": 80 }

Some implementations or tools might even offer a way to return a copy of the instance with these values filled in. This would be useful for applications that want to both validate user input, and fill in defaults; instead of having to perform these as separate operations.

@handrews
Copy link
Contributor

handrews commented Mar 2, 2020

That is a good point about default never actually annotating an instance! I'll have to think more on the rest.

@awwright
Copy link
Member Author

awwright commented Mar 2, 2020

Alternative names could be e.g. "undefinedValues", "defaultValues" or simply "defaults" (plural).

We may also want to consider an array form, for tuples, or function arguments.

One more thing to consider is that "default" does not have to be valid according to the schema. "missing" would have to be, since by definition the behavior is the same.

@gregsdennis
Copy link
Member

gregsdennis commented Mar 2, 2020

The thing that needs to be explicitly stated here is that (given the description above) the default isn't applied when the property isn't present because (in some implementations) that property's subschema is skipped in that case.

I think this is an incorrect behavior.

Core section 9.3.2.1 doesn't state that the property subschema may be skipped if the property is absent, however it does have

The annotation result of this keyword is the set of instance property names matched by this keyword.

which may be interpreted as "don't bother validating properties that aren't there."

I think the proper way to implement this is to process a property's subschema, even if the property is absent. Doing this allows default to generate an annotation, which the consuming application can use to apply the default to the model.

@awwright
Copy link
Member Author

awwright commented Mar 2, 2020

I think this is an incorrect behavior.

On whose part? This isn't a defined behavior, so much as a logical consequence of how we've defined "properties".

I think the proper way to implement this is to process a property's subschema, even if the property is absent.

What would this even mean? It's not meaningful to run undefined against a JSON Schema, as undefined is not valid JSON. And there's no other precedent for such a feature.

And even if we did, that wouldn't fix the problem this keyword is addressing. "default" is allowed to be invalid as an instance, or show a different behavior than omitting the property. e.g. #858

Maybe bring your point up as a new issue?

@handrews
Copy link
Contributor

handrews commented Mar 3, 2020

I think @awwright is correct about the implications of how we've defined applicators and annotations to work. Of course, that's all fairly new and we can tweak it if we want to.

It's also important to think about generative use cases vs validation use cases. Generative use cases work without an instance, and are really where the default keyword is useful.

  • A code generator can initialize a field with it.
  • A doc generator can document the behavior
  • A UI generator can pre-populate an input

In truth, default shouldn't even be examined during validation even if the schema is applied.

While I get the point of the missing keyword and see the use case, I think we would benefit from emphasizing the generative approach as distinct from the validation approach, and see if that helps clarify things. With OpenAPI adopting the latest draft, we will have a lot more eyes on generation and can work with this actively.

If that does not solve the problem, I would be open to a keyword such as this in the future. I am also happy that describing the behaviors of annotations and applicators have given us a sane way to talk about default.

In the meantime, if someone wanted to do this as an extension keyword, that would also be a good way to explore its utility.

@awwright would you be OK with moving this to the vocabularies repository for now? If we try the generative stuff with the OpenAPI folks and that doesn't work, but a trial run extension keyword finds adoption, we can "promote" the issue back to this repo. Does that seem reasonable to you as well @gregsdennis?

@awwright
Copy link
Member Author

awwright commented Mar 4, 2020

@handrews I think this should be kept next to "default"—if anything, this keyword would be used more often than "default" would.

Also, another naming idea:
"defaultProperties" and "defaultItems" — for filling in missing properties and items, respectively.

@Relequestual
Copy link
Member

Relequestual commented Mar 6, 2020

Yeah, I noticed the issue around how default as an annotation will not be collected if the property doesn't exist.
I feel this IS the correct behaviour.

I would suggest actually what we have, and failed to recoginise, is a different class of annotation. It's a "LOCATION ANNOTATION KEYWORD", not just an annotation keyword.

It's an annotation relating to the location, not the instance data.

My knee jerk reaction is to suggest removing it till we can properly define it, however I am ill informed as to the implications for OpenAPI, if any. (Of course, they could define their own logic for how to use default then).

Further, it sounds like a class of keywords we need to define and allow for, especially given, as you've pointed out, there's a class of activities (generation and auto complete in IDEs) which we haven't considered because "those are for vocabularies, man". That's fine, but recognising there's a separate class of keyword which can and likely will be used outside of the "apply schema to instance to get annotations" framework for collecting annotations.

I'm not even 75% "LOCATION ANNOTATION KEYWORD" is the correct phrasing, because it's no longer an annotation when used in the context outside of applying the schema. "structural information keyword" maybe or something similar?

Open to further discussion.

I think we need to hold moving this to vocabularies repo till we decide if we should define (in a very limited way) a new class of keywords.

@gregsdennis
Copy link
Member

gregsdennis commented Mar 7, 2020

I think the proper way to implement this is to process a property's subschema, even if the property is absent. Doing this allows default to generate an annotation, which the consuming application can use to apply the default to the model.

This needs more context.

So this comes from the idea that the JSON instances is ultimately going to be deserializing into a model, and for me, that means .Net. When deserializing in .Net, if a property is missing, knowing what the default should be is important because the property has to be populated with some value. When not specified, the default value for the property's type is used. But if the schema specifies a non-.Net-type-default, that value should be used.

If default doesn't generate an annotation when the property is missing, the application doesn't know about the specified default, so it can't apply it.

@handrews
Copy link
Contributor

handrews commented Mar 7, 2020

@gregsdennis I think a good question to ask is whether that deserialization is part of validation, or part of a form of code generation. Meaning, would it make sense to scan for defaults up front and then look them up as you realize you have a missing value?

@Relequestual I'll come back to your comment when I have a bit more time.

For now, I agree that there's something interesting going on here that warrants continued discussion in this repo.

@gregsdennis
Copy link
Member

It's the other way around. You'd want to validate that the JSON matches your models prior to deserialization. So validation can become part of deserialization, though not strictly required. This is where having those annotations (e.g. from default) helps, especially when the value is missing from the JSON.

@awwright
Copy link
Member Author

awwright commented Mar 8, 2020

As part of this deserialization, once you have a value supplied by default et al., then you can look at all the relevant schemas (including patternProperties, etc) to determine how it should be unpacked.
You might even read "format" for this purpose, and put "date" into a Date object, and so on.

@handrews
Copy link
Contributor

@gregsdennis I agree (I think) with what you say about validation before deserialization. But I guess I'm really thinking of three phases, which might not make any sense for C#/.NET, which I've never known well and haven't looked at at all for over a decade.

I'm thinking in terms of:

  • Code generation: Setting up the code that lays out the class or data structure or whatever, including initialization statements for missing fields. This may be done as a fully separate step producing specific classes, and then deserializing the JSON instance into a class instance / object. Or it might be some sort of just-in-time activity where there is not a class sitting around in code. This is where my lack of understanding of .NET is probably a problem
  • Validation: you don't want to bother instantiating the class (or whatever) if the data is invalid
  • Instantiation: valid data is passed to the class constructor (or however that is expected to work)

Clearly, this is not the way you're thinking about it and since you understand the language you're working in and I don't, I'm assuming I have something wrong here. But I'd like to understand better what that is.

@markchart
Copy link

I think that "missing" (or "defaults" or whatever) is an excellent suggestion (with some comments below), especially because until I considered @awwright 's remarks I did not understand "default" very well myself.

In the application I know best, the schema is loaded with "default" values for properties which are intended to do three things:

(1) Document the application semantics behind the interface-- telling the user "you may choose a value for this parameter but if you don't it will take on the specified 'default' value";

(2) Guide people or processes to generate documents-- telling them how to specify functionally-required parameters completely (as by showing them in a UI and/or adding them to each document) but distinguishing which of those parameters have default values (also useful as initial values displayed in UI when creating a new doc) versus those which must be filled by the user (or which are truly optional, depending on "required");

and, the part that causes the most trouble,

(3) Tell the document parser/validator to supply (insert) missing properties with specified values-- in order to achieve the semantics (1) when the document generator (2) leaves something out, which is very common, as when a human whomps up a minimum-acceptable-doc ("required" properties only) trusting the parser/validator to supply all the default-value properties the application needs.

AJV's useDefaults option gets (3) done well for the application I mentioned (thank you @epoberezkin ), but now I understand better how special that is, since missing properties do not, as @awwright pointed out, actually match anything in the schema for normal validation purposes.

Note that "required" and "default" can't be used together when "default" is used to insert properties instead of just giving advice to document-generators, because if a submitted doc is "required" to have a certain property the parser/validator will never have any reason to insert it.

I have not yet thought through multiple-applicable "missing" conflicts due to "anyOf" etc. but I immediately perceive that having both "default" and "missing" could be a bit awkward. Obviously they can be defined so there is no semantic conflict, but how should they be explained to schema users?

Perhaps like this: if present, "default" indicates the value recommended when a document-generator has nothing better in mind and/or an initial value for a UI to display, while "missing" indicates exactly which properties (and values) may be inserted into a document if not already present when a recipient tries to interpret said doc. Once "missing" becomes a schema property, schema-writers who intend to use it to insert missing properties at (near) validation time will be free to use the "required" keyword along with the "default" keyword to tell document-generators unambiguously that a property must be supplied and ought to be given the "default" value if no other is desired.

If "default" is not present then a document generator could rely on "missing" for a recommended value. That would be backward-compatible as well as giving schema writers some flexibility to guide UI's (there are cases in which recommended value doesn't match default value). Tools can be provided to warn schema writers of unintended mismatches between "default" and "missing" values.

(If we could start from scratch we might revise the names, like default to suggested and missing to defaults.)

@ciabaros
Copy link

ciabaros commented Jan 11, 2021

In my opinion, the entire "default" concept should be removed to promote correct solution architecture. Schema validation is a read-only pass/fail concept, of course it should never be relied on to alter the data (as it seems you all generally agree), but even "missing value" suggestions should not be defined here either. The concept of default values is contextual... when you're storing values, your storage system may consider certain "defaults", when you're exposing data to API users, that may consider entirely different "defaults", no one should be led to believe that the core data schema should carry that information in a properly designed system.

@awwright
Copy link
Member Author

The concept of default values is contextual

Yes, this is a good way to think about it.

when you're storing values, your storage system may consider certain "defaults", when you're exposing data to API users, that may consider entirely different "defaults", no one should be led to believe that the core data schema should carry that information in a properly designed system.

This is a fair point, but that doesn't mean there cannot be a concept of a default value in JSON Schema. It just means that annotation and validations are different functions and sometimes they don't overlap.

@awwright
Copy link
Member Author

Another idea for a keyword name: "fill" or "fillProperties" (as in, "fill in these missing values")

@gregsdennis
Copy link
Member

Also, another naming idea:
"defaultProperties" and "defaultItems" — for filling in missing properties and items, respectively.

I would go with propertyDefaults and itemDefaults. This (to me) highlights the idea that it's the defaults for the properties rather than a set of properties which should be included by default.

@gregsdennis gregsdennis added the proposal Initial discussion of a new idea. A project will be created once a proposal document is created. label Nov 21, 2024
@jviotti
Copy link
Member

jviotti commented Apr 22, 2025

I wonder if we really need new keywords. Seems like all current use of default is invalid in terms of annotations, as it will never be collected if the instance location doesn't exist. What if we would clarify that default has to occur a level above where people use it right now and give it the same semantics as the propertyDefault / itemDefault alternatives discussed here?

For example:

{
  "default": {
    "foo": "bar",
    "bar": "baz"
  },
  "properties": {
    "foo": true,
    "bar": true
  }
}

Same applies to arrays. You can have default set to an array here.

That would likely be a simpler (we can argue people have just been using it wrong) and less confusing change than introducing a whole set of new keywords and leaving the former default keyword in a weird gray area.

@jdesrosiers
Copy link
Member

I'm strongly against reinterpreting default. This keyword has long established and stable semantics. If we think we need a version of default that works as an annotations it should be a new keyword.

Personally, I think it's fine how it is. The alternative is too unintuitive and I think it's ok that it doesn't make sense as an annotation.

@gregsdennis
Copy link
Member

I'm also against redefining the keyword on the grounds that it has well-establish semantics.

I think we need to introduce a new keyword (proposal) and deprecate default.

I think it's fine how it is.

  • For validation, default does nothing. This is fine.
  • For annotation, people expect to receive an annotation telling them the default value that should be applied when the property is missing, but when the property is missing, they don't get any annotations. The keyword is of no use in this scenario.
  • For generation (e.g. code generation), there's no instance: the schema is analyzed directly, so the defaults should be found. This is fine.

The problem lies in the annotative scenario where the instance is missing a property. default is defined to provide a default value for when the instance is missing a property, but its ineffectual.

This is the same problem that draft 3 required had. In draft 3, required says "this property is required". But if "this" property doesn't exist, you don't have an instance value to apply the subschema to, so no validation occurs. The fix was to pull required out one level to make it say "this object requires these properties".

Similarly, default says "this property has a default value of X". But if "this" property doesn't exist, you don't have an instance value/location to apply the annotation to. And similarly, the fix is to pull default out one level to make it say "this object uses these default property values".

This has no impact on validation, it fixes annotation, and it's a breaking change (but a manageable one) for generation.

On itemDefaults (or whatever)

I'm having trouble finding a use case for this. Maybe tuples? If you're using prefixItems to define a sequence of five items, and the third one has a default, how can the third item be missing?

This sound like it's analogous to default method parameters in C#. (Bear with me.) A method (function) can have parameters that have defaults defined so that you don't have to include them when you call it. For example

public void Print(Date value, DateFormat format = DateFormat.Iso8601) { /* ... */ }

// usage
Print(myDate, DateFormat.Rfc3339);
Print(myDate);

The catch is that all parameters with defaults must appear at the end of the parameter list. You can't have any non-defaulted parameters after a defaulted parameter.

public void Print(DateFormat format = DateFormat.Iso8601, Date value) { /* ... */ }

// usage
Print(DateFormat.Rfc3339, myDate); // this is fine now
Print(myDate);                     // but this has problems

This is because C# method calls are positionally significant.

The relevancy here is that if item three in a sequence has a default, then all of the items afterward also need defaults because items in an array are positionally significant. And we don't have a way to enforce that.

I don't think that we need defaultItems or itemDefaults or whatever.

@jviotti
Copy link
Member

jviotti commented Apr 23, 2025

I get the problem on changing the semantics of default based on how so many people and tooling use it right now outside annotations. However, I think that adding a new keyword (even if we deprecate the other one) that has "default" in its name will also create a ton of confusion. People will see two keywords that seem to be useful for setting default values and will get tripped up about which one to use when, why there are 2 variants, etc.

I don't have a solution. Just thinking out loud that if we want to resolve this somehow, we will create confusion either ways :/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
proposal Initial discussion of a new idea. A project will be created once a proposal document is created.
Projects
Status: In Discussion
Development

No branches or pull requests

8 participants