Skip to content

Support for anyOf null, object #17

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
TWiStErRob opened this issue Nov 7, 2022 · 12 comments
Open

Support for anyOf null, object #17

TWiStErRob opened this issue Nov 7, 2022 · 12 comments

Comments

@TWiStErRob
Copy link

TWiStErRob commented Nov 7, 2022

I think there's an edge case of oneOf (#2) that can be implemented easily.

I converterted https://docs.github.com/en/rest/repos/repos#get-a-repository 's "Response Schema" and I got val license: Any instead of val license: License?

    "license": {
      "anyOf": [
        {
          "type": "null"
        },
        {
          "title": "License Simple",
          "description": "License Simple",
          "type": "object",
          "properties": {
            "key": {
              "type": "string",
              "examples": [
                "mit"
              ]
            },
            "name": {
              "type": "string",
              "examples": [
                "MIT License"
              ]
            },
            "url": {
              "type": [
                "string",
                "null"
              ],
              "format": "uri",
              "examples": [
                "https://api.github.com/licenses/mit"
              ]
            },
            "spdx_id": {
              "type": [
                "string",
                "null"
              ],
              "examples": [
                "MIT"
              ]
            },
            "node_id": {
              "type": "string",
              "examples": [
                "MDc6TGljZW5zZW1pdA=="
              ]
            },
            "html_url": {
              "type": "string",
              "format": "uri"
            }
          },
          "required": [
            "key",
            "name",
            "url",
            "spdx_id",
            "node_id"
          ]
        }
      ]
    },
@pwall567
Copy link
Owner

pwall567 commented Nov 9, 2022

Thanks for this.

You say it can be implemented easily, and you're right that it wouldn't be difficult to add code to look for precisely this case and substitute a schema with "type": [ "object", "null" ] (the API uses this formulation in some of the other endpoints, but sadly not in the case you have highlighted). The problem is that it is not so easy to come up with a more general rule that can be applied in a range of cases similar to this.

The generator currently doesn't try to do anything meaningful with anyOf, largely because it's so hard to devise rules that apply in a variety of circumstances. I will look at some possibilities, which may include the case from the GitHub API, but in the meantime, you should be able to generate useful code by taking a copy of the schema and editing it to remove the anyOf and the "type": "null" subschema and changing the "type" of the main subschema to "type": [ "object", "null" ]. I know that most people who use published schema files prefer to be able to use them in their original form, but this would at least give you something to work with.

@cmpaul
Copy link

cmpaul commented Feb 7, 2023

@pwall567 - First of all, thanks for this library!

I am testing your suggested workaround and ran into an issue with validation and code generation of nullable fields that I wanted to get your take on.

We are going to start defining canonical, named types in our schema and referencing them with "$ref" (as opposed to defining them in-line). Consider this example:

definitions:
  SomeCommonType:
    type: object
    properties:
      someProp:
        type: string
  SomeType:
    type: object
    properties:
      commonType:
        "$ref": "#/definitions/SomeCommonType"

The library generates code correctly for this, but we run into problems when the commonType field needs to be nullable. Based on your suggestion above, I am able to make fields nullable by adding type: ["null", object], followed by the "$ref" property.

  SomeType:
    type: object
    properties:
      commonType:
        type: ["null", object]
        "$ref": "#/definitions/SomeCommonType"

This makes this library generate the code we want, but this JSON schema doesn't actually validate correctly: while the type allows null, the referenced sub-schema does not, so this schema is not actually viable for us to use for validation.

From our experimentation, it seems that the only way to use canonical definitions and $refs, and to make some fields nullable is to use oneOf or anyOf to add {type: "null"} as an alternate subschema for a field like this:

  SomeType:
    type: object
    properties:
      commonType:
        oneOf:
        - type: "null"
        - "$ref": "#/definitions/SomeCommonType"

I can work around this by transforming our canonical schema (which uses the oneOf form shown above) into the form that avoids oneOf and uses type: ["null", object] before invoking the library to generate code. This works to give us the validation we want (with 1 schema) and the code generation we want (with another), but it feels odd/wrong that in order to get nullable fields in the code gen, we have to transform a schema that correctly allows null for some fields into a schema that actually doesn't allow null for those fields.

Let me know if I'm missing something (e.g., is there an alternative you see that we're not seeing?). It seems like nullability is a something this library would support without this workaround. If it doesn't yet support it, what do you think about adding limited support for oneOf just for when one subschema allows null and the other is a non-nullable type, like in the example above? (Also: while the title of this issue is about nullable object types, we'd also want this for any nullable type.)

@pwall567
Copy link
Owner

pwall567 commented Feb 8, 2023

Hi @cmpaul , there is a simple solution to your problem – if you're prepared to accept the same compromise that I adopted in relation to required. Consider the following:

  Person:
    type: object
    properties:
      firstName:
        type: string
      middleName:
        type: string
      surname:
        type: string
    required:
    - firstName
    - surname

What would you expect to see generated for middleName? It is not in the required list, so it may be missing from the JSON, but it is not declared as [ string, "null" ]. Most people seem to expect that the property will be null if it is not present in the JSON, so the code generator considers all optional properties as nullable, even if they don't explicitly include null as a type. And middleName in this example will be of type String?.

This shows the way round your problem. Just by omitting the property name from the required list, it automatically becomes nullable.

This may not be what the authors of the JSON Schema specification had in mind, but it's difficult to see how else we could generate usable code from the above example.

@TWiStErRob
Copy link
Author

TWiStErRob commented Feb 8, 2023

So it seems both of these workarounds are actually forcing the users to create a different schema than before, in a way that it actually validates different JSONs.

For example your last suggestion allows:

{ "firstName": "Work", "lastName": "Around" }

while the original schema with all 3 fields in the required list doesn't allow that:

{ "firstName": "Work", "middleName": null, "lastName": "Around" }

"Omitted", and "existing but null" can make a meaningful difference in the API (not with middle name, but in other cases). Java/Kotlin is not able to express this though and they're logically both mapped to nullable fields.

@pwall567
Copy link
Owner

pwall567 commented Feb 8, 2023

As I said, it involves a compromise. Your second example would not validate against the schema, but it would deserialize into the generated Kotlin. If you're concerned to reject this invalid case, you could validate the JSON prior to deserialization, and the json-kotlin-schema project will do just that.

It might be possible to generate Kotlin that would distinguish between null and absent, but this would require custom deserialization, and one of the main goals of the code generator was to generate classes that would be usable by most of the serialization / deserialization libraries without special configuration.

It's my understanding that most users of the code generator are comfortable with the compromise, but I'm always willing to listen to ideas for improving the project.

@cmpaul
Copy link

cmpaul commented Feb 8, 2023

@pwall567 - Thank you!

I see two downsides to the workaround you suggest.

The first is that it prevents us from being able to ensure the presence of correctly formatted keys. In your example, consider the JSON payload:

{ "firstName": "John", "midleName": "Fitzgerald", "lastName": "Kennedy" }

Notice that midleName is misspelled. If we remove middleName from required, this field will be ignored during validation. (This is important for us because we validate JSON from Ruby clients without static typing, so misspellings are possible).

Second, this workaround does not work for an array of nullable fields. Consider an example where we want to define a list that contains items that are either null or of type SomeCommonType. In your workaround, this would look like:

SomeType:
  type: object
  properties:
    someArray:
      type: array
      items:
          "$ref": "#/definitions/SomeCommonType"
  required:
    # Nothing required

This results in a data class with someArray defined as:

val someArray: List<SomeCommonType>? = null

This isn't quite what we want, since this means that the list itself can be null, or it can contain items of SomeCommonType. The code below shows what we're after:

val someArray: List<SomeCommonType?>

Right now, we can generate 2 schemas as a workaround: 1 that works for validation (using oneOf) and 1 that works for code gen (using type: ["null", object], followed by the "$ref"). However this leaves me with the same concern I raise in my first comment.

Are there other avenues we can explore?

@pwall567
Copy link
Owner

pwall567 commented Feb 9, 2023

I see your misspelling problem as a different issue. Your JSON deserialization library will almost certainly have an option to reject unrecognised properties (I recommend the kjson library, which defaults to this behaviour, but other libraries have optional settings to the same effect). And if you are using JSON Schema Validation, the additionalProperties: false setting will catch the misspelling.

But I do accept that the use of required will not help in the case of nullable array items. The code generator currently includes simplistic support for allOf – it aggregates all the validations for each of the allOf entries and attempts to create a field that includes them all. So the following:

SomeType:
  type: object
  properties:
    someArray:
      type: array
      items:
        allOf:
        - $ref: '#/definitions/SomeCommonType'
        - type: "null"

will probably achieve what you want, because the type within the definition of SomeCommonType will be combined with the type: "null" to create a nullable reference. But it will not pass Schema Validation, because the two type keywords will be applied individually , and one of them will fail.

I'm starting to come around to the original proposal. I have tried to avoid putting in code for very narrow cases, but I think I may have to accept that this is a form of usage that is sufficiently frequently required that it justifies special treatment.

What I am thinking of is allowing either oneOf or anyOf with exactly two items, where one of the items is just type: "null". In this case it will process the other item as normal, but make the generated property or array item nullable. In the example above, allOf would be replaced by oneOf, which would be correct for Schema Validation as well as for code generation.

I'll try to get a version for you (both) to test in a few days.

@pwall567
Copy link
Owner

pwall567 commented Feb 9, 2023

On second thoughts – don't try the example in my previous post (using allOf) – it will generate very poor code. I will have an implementation of the anyOf or oneOf approach available shortly.

@pwall567
Copy link
Owner

OK, I've implemented the special case of oneOf or anyOf described above, so the following (copied from one of your examples):

  SomeType:
    type: object
    properties:
      commonType:
        oneOf:
        - type: "null"
        - "$ref": "#/definitions/SomeCommonType"

should work as you expect.

I've documented this in the README. (The explanation is probably not particularly clear, but anyone who has encountered the issue should be able to grasp it.)

Version 0.87 includes this implementation; give it a try and let me know if it resolves this issue.

@cmpaul
Copy link

cmpaul commented Feb 13, 2023

Thank you, Peter! 🙇 Will test this today and let you know how it goes on our side.

@cmpaul
Copy link

cmpaul commented Feb 15, 2023

@pwall567 Confirmed! This works beautifully for us. Thanks again for the amazing turn-around on this.

@TWiStErRob
Copy link
Author

TWiStErRob commented Feb 17, 2023

Way better @pwall567, now the generated code is usable, thank you!

I wrote a quick script to test it (kotlinc -script generate.main.kts, note .main.kts ending is really important!):

@file:DependsOn("net.pwall.json:json-kotlin-schema-codegen:0.88")

import net.pwall.json.schema.codegen.CodeGenerator
import java.io.File

val codeGenerator = CodeGenerator().apply {
  baseDirectoryName = "generated"
  basePackageName = "com.github.repository"
}

// FullRepository.schema.json is a copy-paste of the response JSON Schema from
// https://docs.github.com/en/rest/repos/repos#get-a-repository
codeGenerator.generate(File("FullRepository.schema.json"))

Compare these two outputs, heaven and earth...
FullRepository86.kt
FullRepository88.kt

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants