Skip to content

only check strings for patternProperties matches #286

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

jstewmon
Copy link

Fixes #285

I looked over the tests, but wasn't able to infer the appropriate place to add test cases for these errors. Should I add two new tests to test_validators.py - one for the patternProperties validator and another for the additionalProperties validator?

@Julian
Copy link
Member

Julian commented May 24, 2016

Hi. Bleh, this is... an interesting thing. Thanks for filing it.

I'm not 100% sure whether this is the right solution. I'm not not sure it is either, but at least initially:

JSON doesn't have non-string keys. So technically what you're validating there isn't valid deserialized JSON.

That being said (and I'm being a bit short here), I'm not against fixing this I suppose, it's a bit though that the most obvious place to add a test for this is as a JSON test case in the json/tests directory, but given the above, you can't, since you can't represent this instance as JSON. So it'd be a Python-level test case, yeah.

Thoughts?

@jstewmon
Copy link
Author

Hmm... interesting, indeed. Initially, I didn't consider that a dict with non-string keys could not have been deserialized from a valid JSON document.

I think it's reasonable to require the object being validate to be [strictly] JSON-serializable itself. I'm agreeable to the resolution to #285 being that a ValidationError is yielded/raised if the object being validated does not have an exact JSON representation.

Would it be reasonable to validate that all keys of an object are a string type in the type_ validators?

@jstewmon
Copy link
Author

For example, this type_draft4 implementation:

def type_draft4(validator, types, instance, schema):
    types = _utils.ensure_list(types)

    if not any(validator.is_type(instance, type) for type in types):
        yield ValidationError(_utils.types_msg(instance, types))

    if validator.is_type(instance, "object"):
        for key in instance:
            if not validator.is_type(key, "string"):
                yield ValidationError(
                    "%r has property %r, which is not a string" % (
                        instance, key
                    )
                )

When given:

from jsonschema import validate

schema = {
    'type': 'object',
    'patternProperties': {
        '^[0-9]{3}$': {'type': 'string'}
    },
}

validate({200: 'hello'}, schema)

Results in:

Traceback (most recent call last):
  File "/Users/jstewmon/Library/Application Support/IntelliJIdea2016.1/python/helpers/pydev/pydevd.py", line 1530, in <module>
    globals = debugger.run(setup['file'], None, None, is_module)
  File "/Users/jstewmon/Library/Application Support/IntelliJIdea2016.1/python/helpers/pydev/pydevd.py", line 937, in run
    pydev_imports.execfile(file, globals, locals)  # execute the script
  File "/Users/jstewmon/Library/Preferences/IntelliJIdea2016.1/scratches/scratch_159", line 10, in <module>
    validate({200: 'hello'}, schema)
  File "/Users/jstewmon/dev/forks/jsonschema/jsonschema/validators.py", line 478, in validate
    cls(schema, *args, **kwargs).validate(instance)
  File "/Users/jstewmon/dev/forks/jsonschema/jsonschema/validators.py", line 123, in validate
    raise error
jsonschema.exceptions.ValidationError: {200: 'hello'} has property 200, which is not a string

Failed validating 'type' in schema:
    {'patternProperties': {'^[0-9]{3}$': {'type': 'string'}},
     'type': 'object'}

On instance:
    {200: 'hello'}

@Julian
Copy link
Member

Julian commented May 26, 2016

I haven't had another chance to think about this, but I definitely don't think the properties validator should change or that this should be a ValidationError instead of what it is -- this is someone providing a "nonsensical" schema in the context of JSON Schema -- so in my mind this should either be an exception as it is now, or, if we wanted to "stretch" JSON Schema into a more general solution for things beyond valid-deserialized-JSON, the solution would be to ignore non-string keys, or if we felt really nasty, to replicate JS's behavior of coercing non-strings to strings.

I haven't thought enough about those 3, but definitely not ValidationError to me.

@jstewmon
Copy link
Author

this is someone providing a "nonsensical" schema in the context of JSON Schema

Just to clarify, it's a "nonsensical" instance, with a valid schema

If I can add some color to the 3 options you proposed...

an exception as it is now

The validate docstring indicates that the only raises are ValidationError and SchemaError. Originally, I filed an issue because the current behavior "leaks" the TypeError from re. I had to use a debugger to figure out what the problem was... Once I found source of the error, my instinctive thought was "oh, this should be a ValidationError because there's problem with the instance being passed."

I follow your reasoning for it not being a ValidationError, but I still think that it should be some type of error that is defined and documented by jsonschema (if an error is to be raised - see comments on type coercion).

ignore non-string keys

Oh please not this - an instance may pass validation only to fail later by violating the programmer's assumption that successful validation implies that the instance conforms to the schema. If non-string keys are skipped, subsequent serialization may produce json which does not conform.

For example:

import json

from jsonschema import validate

schema = {
    'type': 'object',
    'patternProperties': {
        '^[0-9]{3}$': {'type': 'string'}
    },
}

instance = {1234: 'what'}
# passes if ignored
validate(instance, schema)
# fails after int is coerced to str during serialization
validate(json.loads(json.dumps(instance)), schema)

if we felt really nasty, to replicate JS's behavior of coercing non-strings to strings

While I agree that this is "nasty", it is actually what the python JSONEncoder does:

import json
import sys

json.dump({200: 'ok'}, sys.stdout)

outputs:

{"200": "ok"}

Maybe this is a reasonable choice because validate would then answer the question (for instances which were not created by deserializing JSON) "If I were to json-serialize this instance, would it be valid according to the given schema?" This wouldn't be my top choice, but it would align with the behavior of the JSONEncoder, flawed as it may be...

@Julian
Copy link
Member

Julian commented Jun 23, 2016

The validate docstring indicates that the only raises are ValidationError and SchemaError. Originally, I filed an issue because the current behavior "leaks" the TypeError from re. I had to use a debugger to figure out what the problem was... Once I found source of the error, my instinctive thought was "oh, this should be a ValidationError because there's problem with the instance being passed."

I follow your reasoning for it not being a ValidationError, but I still think that it should be some type of error that is defined and documented by jsonschema (if an error is to be raised - see comments on type coercion).

I'm still being a bit hard-lined, but just to broadly disagree with this, I don't think this tends to be the case in Python. Objects can raise any sort of exception if the user of the object doesn't provide an object of the correct type. If one does provide an object of the wrong type, all bets are generally off and you get whatever exception that type raises when the wrong typed object gets poked the wrong way, but library authors do not (and should not) just wrap exceptions for the sake of uniformity there.

Oh please not this - an instance may pass validation only to fail later by violating the programmer's assumption that successful validation implies that the instance conforms to the schema. If non-string keys are skipped, subsequent serialization may produce json which does not conform.

There's an assumption here that instances should round trip via serialization -- that happens to not be true even for == in your case, so I'm not sure JSON Schema validation should be held to a higher standard. A person with such an instance needs to do special things to it to serialize and deserialize it in a way that round trips regardless, same as a user who has an instance that they want to get decimal.Decimals out of. And for better or worse JSON Schema already specifies that validator types generally should ignore elements of different types (maxItems ignores non-arrays, maxLength ignores non- strings, properties ignores non-objects, ...)

I'm not sure if I'm leaning personally one way or the other though here. I'll still have to think a bit and possibly see what validators in other languages do.

@Julian
Copy link
Member

Julian commented Jul 30, 2016

@fge can I ask you -- yours operates on Java objects right, not strings -- what does it do in this case, where you provide a schema whose keys "could never" have been JSON but then asserts about patternProperties?

@MattF-NSIDC
Copy link

MattF-NSIDC commented Dec 14, 2016

I think at the very least, we should see a clear error that indicates the object is invalid as JSON. I'm working around this with json.loads(json.dumps(obj)), but it took me a long time to finally realize the underlying issue. The current exception that is raised sends the user on a wild goose hunt that will hopefully end at this PR.

@Julian
Copy link
Member

Julian commented Dec 25, 2016

So at a high level, I think a reasonable resolution to this issue is at least two things:

  • a philosophical addition to the documentation on what stance we take when faced with an issue that lives beyond JSON-Schema-the-specification-and-its-implementation-in-Python so that people can have intuition on what the library is likely to do when faced with such a case
  • a resolution following that philosophical stance.

I'm inclined to think that we should think about the philosophical stance first, and then to derive the appropriate behavior from there, but some consideration for backwards compatibility also belongs.

@MattF-NSIDC
Copy link

MattF-NSIDC commented Dec 25, 2016

  • a philosophical addition to the documentation on what stance we take when faced with an issue that lives beyond JSON-Schema-the-specification-and-its-implementation-in-Python and prevents processing from proceeding so that people can have intuition on what the library is likely to do when faced with such a case

How about a new type of exception to categorize these types of errors.

Going more low level, (naming is hard...) it could be called SchemaError, and some child exception(s) to more clearly describe the problem, such as InvalidJSONError, and throw these to communicate the issue to the user. In this case, "A schema must be valid JSON (all keys must be strings).". The best place for the "all keys must be strings" part may not be in the error message, but it should probably be in the documentation for that exception.

I could see this change breaking someone's code if they're trying to do something like this in response to the problem:

try: 
   validate(json, schema)
except TypeError:
   validate(json, json.loads(json.dumps(schema)))

Happy holidays!

@Julian
Copy link
Member

Julian commented Aug 20, 2017

@epoberezkin what does ajv do for this?

@epoberezkin
Copy link

@Julian keys in the map/dictionary in JavaScript can be only strings, so ajv doesn't do anything special about it.

@Julian
Copy link
Member

Julian commented Aug 20, 2017

@epoberezkin even for ES6 Maps -- can't those have integer keys / does ajv support validating them?

@epoberezkin
Copy link

epoberezkin commented Aug 20, 2017

Ajv only validates enumerable properties and ES6 Maps keys are not properties. ajv will treat ES6 Maps as empty objects.

@Julian
Copy link
Member

Julian commented Aug 20, 2017 via email

@Julian
Copy link
Member

Julian commented Nov 9, 2017

@handrews this question straddles the boundaries of JSON Schema, Python, and this libraries essence I suppose, but ping just in case you have any thoughts.

@handrews
Copy link

handrews commented Nov 9, 2017

Objects can raise any sort of exception if the user of the object doesn't provide an object of the correct type. If one does provide an object of the wrong type, all bets are generally off and you get whatever exception that type raises when the wrong typed object gets poked the wrong way, but library authors do not (and should not) just wrap exceptions for the sake of uniformity there.

Agreed. If you pass an unexpected/incorrect type, a type error should be expected. At most, I would just document this, but it's true across pretty much all Python libraries.

There's an assumption here that instances should round trip via serialization

JSON Schema definitely should not have more stringent requirements than JSON itself. It should possibly have less, given that, as documented in the spec, certain values in the JSON Schema data model may have multiple representations in a JSON document, and detecting and working with such differences is outside of the scope of the spec.

And for better or worse JSON Schema already specifies that validator types generally should ignore elements of different types (maxItems ignores non-arrays, maxLength ignores non- strings, properties ignores non-objects, ...)

There are very solid reasons for this which I am not going to get into now, but it should remain true that all keywords apply to one or more types, and ignore other types, and definitely do not coerce or assert types.

How about a new type of exception to categorize these types of errors.

This could work- I would call it something like DataModelError as an integer object key is outside of the data model defined in the JSON Schema core specification. I do not think this is required (or forbidden) by the spec. As @epoberezkin in some languages it's not possible to violate the data model, at least not in this way. But in others it is.

For performance, I would go with garbage-in-garbage-out, and say the caller is responsible for providing a data structure that meets the data model, and failing to do so would be expected to produce a TypeError, as with type mismatches anywhere else.

For user-friendliness, I could see allowing (opt-in?) a check for data model violations and providing a specific exception for it.

@Julian
Copy link
Member

Julian commented Nov 23, 2017

I agree with all the above (well, unsurprisingly I suppose).

For now, I think my gut says to just close this (and my gut also says that something like https://github.com/Julian/Seep might have better answers for this type of thing, but that is still vaporware unfortunately...)

If anyone feels like arguing for anything else strongly, please do follow up.

@Julian Julian closed this Nov 23, 2017
Julian added a commit to Zac-HD/jsonschema that referenced this pull request Oct 7, 2019
433ab2f0 Merge pull request python-jsonschema#286 from Zac-HD/not-patterns
dbaa3aac Fix data - escape unicode, not regex pattern

git-subtree-dir: json
git-subtree-split: 433ab2f0fd6b981527e838cd149c91d823bdc367
Julian added a commit that referenced this pull request Oct 7, 2019
4e6c78b4 Make the draft3 ECMA regex tests consistently named.
433ab2f0 Merge pull request #286 from Zac-HD/not-patterns
dbaa3aac Fix data - escape unicode, not regex pattern

git-subtree-dir: json
git-subtree-split: 4e6c78b4d878aa427d7b41e7b65cde0ea6f503ea
@OddBloke
Copy link

We just hit this, and while I agree with the general resolution, it would have been very helpful for the TypeError to include a more specific error message about what was going on. We're using jsonschema actually to validate YAML, and we hadn't really had to reckon with the fact that YAML keys aren't required to be strings, so it took us a while to realise that we were validating {0: "foo"} and not {"00": "foo"} as we thought.

Just something as simple as a try/except TypeError around the re.search which just does raise TypeError("patternProperties requires keys to be strings (got: {})".format(type(k))) on failure would do, I think.

@Julian
Copy link
Member

Julian commented May 14, 2020

@OddBloke that kind of thing probably happy to review a PR for if you're interested -- with tests :), and with a new exception type rather than TypeError (e.g. PatternMatchingFailed or something).

But yeah definitely open to just wrapping the exception with a more specific one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Object with integer property causes TypeError when patternProperties are defined
6 participants