Skip to content

Return regex pattern error if regex failed to compile #425

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

manycoding
Copy link

If an invalid regex pattern were provided, validation fails.
So I added validate_regex to check it and also a yield "Pattern error in" .

@manycoding
Copy link
Author

@Julian The tests are ok after I rebased and merged the last master.
Would you please review it?

@Julian
Copy link
Member

Julian commented Aug 15, 2018

Hi -- this wouldn't be the right place to do that check, it'd need to happen during schema checking, but that should already be the case, e.g.:

⊙  ~[jsonschema:python] -c 'import jsonschema; jsonschema.validate(u"foo", {u"pattern": object()})'                                                              Julian@Magbook ●
Traceback (most recent call last):
  File "<module>", line 1, in <module>
  File "jsonschema/validators.py", line 823, in validate
    cls.check_schema(schema)
  File "jsonschema/validators.py", line 244, in check_schema
    raise SchemaError.create_from(error)
SchemaError: <object object at 0x000000010b2a8020> is not of type u'string'

Failed validating u'type' in metaschema[u'properties'][u'pattern']:
    {u'format': u'regex', u'type': u'string'}

On schema[u'pattern']:
    <object object at 0x000000010b2a8020>

Are you experiencing some specific issue here that's prompting adding this?

@manycoding
Copy link
Author

manycoding commented Aug 27, 2018

@Julian
Yes, I meant the case when the regexp is a valid string, but an invalid python regex.

>>> Draft6Validator.check_schema({"pattern": "\q"})
>>> jsonschema.validate("s", {"pattern": "\q"})
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/valery/Documents/_code/jsonschema/jsonschema/validators.py", line 822, in validate
    cls(schema, *args, **kwargs).validate(instance)
  File "/Users/valery/Documents/_code/jsonschema/jsonschema/validators.py", line 304, in validate
    for error in self.iter_errors(*args, **kwargs):
  File "/Users/valery/Documents/_code/jsonschema/jsonschema/validators.py", line 280, in iter_errors
    for error in errors:
  File "/Users/valery/Documents/_code/jsonschema/jsonschema/_validators.py", line 248, in pattern
    not re.search(patrn, instance)
  File "/Users/valery/.local/share/virtualenvs/jsonschema-zk5Nmia0/lib/python3.6/re.py", line 182, in search
    return _compile(pattern, flags).search(string)
  File "/Users/valery/.local/share/virtualenvs/jsonschema-zk5Nmia0/lib/python3.6/re.py", line 301, in _compile
    p = sre_compile.compile(pattern, flags)
  File "/Users/valery/.local/share/virtualenvs/jsonschema-zk5Nmia0/lib/python3.6/sre_compile.py", line 562, in compile
    p = sre_parse.parse(p, flags)
  File "/Users/valery/.local/share/virtualenvs/jsonschema-zk5Nmia0/lib/python3.6/sre_parse.py", line 855, in parse
    p = _parse_sub(source, pattern, flags & SRE_FLAG_VERBOSE, 0)
  File "/Users/valery/.local/share/virtualenvs/jsonschema-zk5Nmia0/lib/python3.6/sre_parse.py", line 416, in _parse_sub
    not nested and not items))
  File "/Users/valery/.local/share/virtualenvs/jsonschema-zk5Nmia0/lib/python3.6/sre_parse.py", line 502, in _parse
    code = _escape(source, this, state)
  File "/Users/valery/.local/share/virtualenvs/jsonschema-zk5Nmia0/lib/python3.6/sre_parse.py", line 401, in _escape
    raise source.error("bad escape %s" % escape, len(escape))
sre_constants.error: bad escape \q at position 0

If it happens it causes an exception which is unwanted in a typical use case, thus I wrapped up the error:

>>> jsonschema.validate("s", {"pattern": "\q"})
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/valery/Documents/_code/jsonschema/jsonschema/validators.py", line 822, in validate
    cls(schema, *args, **kwargs).validate(instance)
  File "/Users/valery/Documents/_code/jsonschema/jsonschema/validators.py", line 305, in validate
    raise error
jsonschema.exceptions.ValidationError: Pattern error in {'pattern': '\\q'}

Failed validating 'pattern' in schema:
    {'pattern': '\\q'}

On instance:
    's'

@Julian
Copy link
Member

Julian commented Aug 29, 2018 via email

@manycoding
Copy link
Author

manycoding commented Sep 4, 2018

Would you please elaborate a bit?
I have digged up to the format validator which indeed has empty format_checker but I still have no idea where to pass this checker and how to get my hands on it.

Update:
I guess I should pass format_checker to create() method here https://github.com/Julian/jsonschema/blob/3eabefb0c538d44bec414d84a7ab576dac5de6ee/jsonschema/validators.py#L463

Update:
pattern errors throw exceptions, but I need to figure out why patternProperties doesn't:

>>> Draft6Validator.check_schema({"patternProperties": {"\q": {"type": "number"}}})
>>> 
>>> Draft6Validator.check_schema({"pattern": "\q"})
Traceback (most recent call last):
  File "/Users/valery/Documents/_code/jsonschema/jsonschema/_format.py", line 96, in check
    result = func(instance)
  File "/Users/valery/Documents/_code/jsonschema/jsonschema/_format.py", line 237, in is_regex
    return re.compile(instance)
  File "/Users/valery/.local/share/virtualenvs/jsonschema-zk5Nmia0/lib/python3.6/re.py", line 233, in compile
    return _compile(pattern, flags)
  File "/Users/valery/.local/share/virtualenvs/jsonschema-zk5Nmia0/lib/python3.6/re.py", line 301, in _compile
    p = sre_compile.compile(pattern, flags)
  File "/Users/valery/.local/share/virtualenvs/jsonschema-zk5Nmia0/lib/python3.6/sre_compile.py", line 562, in compile
    p = sre_parse.parse(p, flags)
  File "/Users/valery/.local/share/virtualenvs/jsonschema-zk5Nmia0/lib/python3.6/sre_parse.py", line 855, in parse
    p = _parse_sub(source, pattern, flags & SRE_FLAG_VERBOSE, 0)
  File "/Users/valery/.local/share/virtualenvs/jsonschema-zk5Nmia0/lib/python3.6/sre_parse.py", line 416, in _parse_sub
    not nested and not items))
  File "/Users/valery/.local/share/virtualenvs/jsonschema-zk5Nmia0/lib/python3.6/sre_parse.py", line 502, in _parse
    code = _escape(source, this, state)
  File "/Users/valery/.local/share/virtualenvs/jsonschema-zk5Nmia0/lib/python3.6/sre_parse.py", line 401, in _escape
    raise source.error("bad escape %s" % escape, len(escape))
sre_constants.error: bad escape \q at position 0

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/valery/Documents/_code/jsonschema/jsonschema/validators.py", line 257, in check_schema
    raise SchemaError.create_from(error)
jsonschema.exceptions.SchemaError: '\\q' is not a 'regex'

Failed validating 'format' in metaschema['properties']['pattern']:
    {'format': 'regex', 'type': 'string'}

On schema['pattern']:
    '\\q'

@manycoding
Copy link
Author

@Julian Would you recommend the best way to handle patternProperties?

I thought about calling pattern from within addittionalProperties() but it doesn't work and I have no idea why

def additionalProperties(validator, aP, instance, schema):
    if not validator.is_type(instance, "object"):
        return

    extras = set(_utils.find_additional_properties(instance, schema))

    if validator.is_type(aP, "object"):
        print("additionalProperties")
        # print(f"extras {extras}")
        for extra in extras:
            print("here")
            pattern(validator, extra, instance[extra], aP)
            for error in validator.descend(instance[extra], aP, path=extra):
                yield error

@Julian
Copy link
Member

Julian commented Sep 14, 2018

Hey, sorry, things are a bit hectic, have just returned from vacation and started a new job :)

To elaborate, it's the check_schema method that should be changing -- namely, it should grow a format_checker argument (the same way that __init__ takes one), and it should default to creating one (and then passing it into the validator instance it creates). I think other than that two line change (and the associated tests) nothing else should need to change.

Does that help? Happy to answer further questions, and definitely appreciated!

@manycoding
Copy link
Author

@Julian Hi, I passed the format_checker but I cannot figure out how to create a default one. As far as I understand, each meta schema should have it's own format checker .

Also, it doesn't check patternProperties since the format_checker checks only the value, not the instance name.

Last but not least, I added tests for both (patternProperties obviously fails).

@Julian
Copy link
Member

Julian commented Sep 29, 2018

Hey! Thanks for continuing to tackle this!

I have to re-review your patch, but on the patternProperties front, I believe that the recent drafts added propertyNames, which is a thing that can apply schemas to properties.

There's a possibility though that the meta schema isn't actually trying to apply that to patternProperties / properties, which I think would be a bug to file upstream if so.

But tl;dr -- I think if what you've got isn't catching patternProperties issues, that that needs addressing upstream (in the spec, or in the meta schemas).

Will try to leave some additional comments after reviewing, but yeah thanks again, this is definitely appreciated!

@manycoding
Copy link
Author

@Julian Hi, I would appreciate if you review this one once you have a free minute.

@Julian
Copy link
Member

Julian commented Oct 16, 2018

Thanks! This is looking close to what I expected. Looks like CI is failing though. Possibly you just need to merge in master.

@manycoding
Copy link
Author

validator = self.Validator({})
self.assertIsNone(validator.format_checker)
self.assertIsInstance(validator.format_checker, FormatChecker)
Copy link
Member

@Julian Julian Oct 16, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test shouldn't pass (i.e. it was correct before), so if it does, something seems wrong -- the goal here wasn't to change the default for all validators, it was just to do so for calling .check_schema

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

validator.format_checker is not None, I think it's because I passed it here https://github.com/Julian/jsonschema/pull/425/files#diff-ed2f83de8c2ff5b1c1138f9f28ba4ba6R509

@manycoding
Copy link
Author

I made suggested updates, there are still some failures though:

  1. I guess test_it_does_not_validate_formats_by_default fails because I pass format_checker in create method
  2. I have no idea why all those ref tests for TestDraft43 TestDraft4 are failing

@@ -220,7 +227,7 @@ def __init__(
schema,
types=(),
resolver=None,
format_checker=None,
format_checker=format_checker,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's failing because of this line, where you've changed the default here.

@@ -257,7 +265,7 @@ def iter_errors(self, instance, _schema=None):
return
elif _schema is False:
yield ValidationError(
"False schema does not allow %r" % (instance,),
"False schema does not allow '{}'".format(instance)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you undo this too?

@Julian
Copy link
Member

Julian commented Oct 17, 2018

Think we're close!

@manycoding
Copy link
Author

Done. Can it be that that passed format_checker messes with uri format_checker?

with self.assertRaises(SchemaError):
self.Validator.check_schema({"pattern": "\q"})

@unittest.skip("This test fails but it shouldn't, see"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right so here, this test doesn't fail on draft 7, which properly says that the keys need to be regexes for patternProperties. It does that though using a new validator (propertyNames) that didn't exist before draft 6.

Draft 6 though seems to have a bug in the metaschema (by not specifying propertyNames for patternProperties even though that existed for draft 6), which sounds like something that should be filed upstream.

And drafts 3 and 4 don't have that, so that's why this fails -- so it seems like the right thing to do isn't to skip this here, it should pass on draft 7, and draft 6 when/if the metaschema is fixed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, right, I think the reason it's not there in draft6 is because draft6 (and 4) accidentally removed the regex format.

So, this seems to be a thing that really only should pass on draft 7, and "correctly" (FSVO correctly) blows up on earlier drafts.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understood, how can I skip the test on particular drafts? I don't understand the skip lambda https://github.com/Julian/jsonschema/blob/master/jsonschema/tests/test_jsonschema_test_suite.py#L81

@manycoding
Copy link
Author

I don't understand skip lambda, how can I skip tests with it?
I think some test cases are generated but I cannot find where, for example valid tree keywords.

@Julian
Copy link
Member

Julian commented Oct 31, 2018

@manycoding hey! So I wouldn't bother with that stuff here, what we need here is just a test case we only run for draft 6 and 7 -- the most straightforward way to do that is just a InvalidRegexMixin class that you then mix in to 2 test cases, a TestDraft6 one and a TestDraft7 one. If that doesn't make eminent sense (I don't remember if those 2 test cases still exist or not) let me know, it's fairly easy so if I'm unclear I can give you a specific line to look at.

@manycoding
Copy link
Author

manycoding commented Nov 5, 2018

I moved pattern tests to their own draft class, but it seems that my code affects ref in Draft3 and 4 and it's not comprehensible to me in what ways.

@manycoding
Copy link
Author

@Julian I really would like to finish this one in this year :)

@Julian
Copy link
Member

Julian commented Nov 21, 2018

Hey @manycoding so would I!

I've unfortunately got extremely limited time, which currently even for jsonschema needs to be prioritized on getting a production-ready draft 6/7 release out. Have tried to give you some tips to help you find out why your change is breaking things but really would need to be you figuring out how to get all of them to pass until I manage to have time to help.

Definitely appreciate the PR.

@manycoding
Copy link
Author

@Julian I have a hunch that the failing tests could be connected to the fact that draft 3 and 4 don't have $ref defined in meta schema properties.

@Julian Julian force-pushed the regex_value_validation branch from 47d2064 to 2c15ae3 Compare January 6, 2019 10:21
@codecov
Copy link

codecov bot commented Jan 6, 2019

Codecov Report

Merging #425 into master will decrease coverage by 0.01%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master     #425      +/-   ##
==========================================
- Coverage   95.13%   95.11%   -0.02%     
==========================================
  Files          18       18              
  Lines        2425     2437      +12     
  Branches      309      306       -3     
==========================================
+ Hits         2307     2318      +11     
- Misses         97      101       +4     
+ Partials       21       18       -3

Julian added a commit that referenced this pull request Jan 6, 2019
Because FormatChecker is bad and I should feel bad :/

This is currently being overriden by the older Draft3 time
format.

Probably time to deprecate cls_checks, and pass format checkers
in on validator creation.

The changes in #425 are likely also relevant.
@Julian
Copy link
Member

Julian commented Jan 11, 2019

@manycoding FWIW I am going to try to finally find some time to give you another breadcrumb (or ideally figure out what's up here) some time next week :/

Fingers crossed!

@Julian Julian force-pushed the regex_value_validation branch from 2c15ae3 to 6731bcc Compare February 10, 2019 17:49
Julian added 3 commits June 17, 2019 08:10
* manycoding/regex_value_validation:
  Set format_checker to None for draft3 and 4
  Move test_invalid_pattern under Draft6 and 7
  Enable invalid regex tests
  Move regex patternProperty to InvalidRegexMixin
  Add propertyNames to Draft6 patternProperties, do not skip patternProperties test
  Pass draft7_format_checker to meta schema, set default Validator format_checker to None
  Revert style changes and test_it_does_not_validate_formats_by_default
  Fix test_it_validates_formats_by_default
  Add quotes back to False schema does not allow
  Fix test_False_schema
  Update test_it_validates_formats_by_default
  Ignore test_invalid_patternProperty
  Add invalid_patternProperty test
  Add invalid pattern tests
  Pass format_checker to validator in check_schema
  Pass format_checkers to meta schemas
  Fix regex output msg
  Return regex pattern error if regex failed to compile
@Julian Julian force-pushed the regex_value_validation branch from 6615403 to 4853f05 Compare September 4, 2019 07:36
@Julian Julian force-pushed the regex_value_validation branch from 4853f05 to 6615403 Compare January 8, 2020 07:01
@Julian Julian closed this Oct 20, 2020
@Julian Julian deleted the branch python-jsonschema:master October 20, 2020 16:37
@manycoding manycoding deleted the regex_value_validation branch October 21, 2020 14:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants