Skip to content

Incorrectly escaped backslashes in regex in generated code #40

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
dktapps opened this issue Oct 5, 2021 · 9 comments
Closed

Incorrectly escaped backslashes in regex in generated code #40

dktapps opened this issue Oct 5, 2021 · 9 comments
Assignees
Labels
bug Something isn't working

Comments

@dktapps
Copy link
Contributor

dktapps commented Oct 5, 2021

Describe the bug

    },
    "main": {
      "description": "The fully-qualified name of the main class that extends PluginBase",
      "pattern": "([A-Za-z_]\\w+\\\\)*([A-Za-z_]\\w+)",
      "type": "string"
    },

This schema causes the following code to be generated:

                    if (is_string($value) && !preg_match('/([A-Za-z_]\w+\\)*([A-Za-z_]\w+)/', $value)) {
                        $this->_errorRegistry->addError(new \PHPModelGenerator\Exception\String\PatternException($value ?? null, ...array (
  0 => 'main',
  1 => '([A-Za-z_]\\w+\\\\)*([A-Za-z_]\\w+)',
)));
                    }
                

                return $value;
            }

This pattern is meant to allow backslashes, hence the 4 \\\\ (to match the literal backslash character).

You can see that the backslash given to preg_match() is incorrectly escaped: there are only 2 backslashes in the resulting string, which is interpreted by PHP as a single backslash even in a single-quoted string.

Expected behavior
The backslash pattern above should be correctly escaped.

Schema
plugin-schema.json.txt

Version:
0.21.0

@dktapps
Copy link
Contributor Author

dktapps commented Oct 5, 2021

Digging, seems like this is quite an inconvenient problem to solve. My solution would be to embed the patterns as base64 blobs instead of as bare strings, and then generate code like preg_match(base64_decode('AAAAAAA===')) instead, so that PHP string escaping doesn't get a chance to screw anything up.

@wol-soft
Copy link
Owner

wol-soft commented Oct 6, 2021

Hi @dktapps

Thanks for the report! I'll have a look into it later.

@wol-soft wol-soft self-assigned this Oct 6, 2021
@wol-soft wol-soft added the bug Something isn't working label Oct 6, 2021
@wol-soft
Copy link
Owner

wol-soft commented Oct 6, 2021

After some testig I've adopted your idea of using encoded strings rather than the regular expression itself in the generated code. I've also added a test case including a regular expression with backslashes´and forward slashes to test escaping and the en-/decoding.

@dktapps
Copy link
Contributor Author

dktapps commented Oct 6, 2021

Additional escaping shouldn't be needed at all if you base64, no? It's only needed in the first place to prevent PHP interpreting it differently as a const string.

@wol-soft
Copy link
Owner

wol-soft commented Oct 6, 2021

Yeah I thought the same way first but it's wrong (compare https://3v4l.org/YLRvS which matches a single backslash).

Also forward slashes must be escaped as they are used as delimiter of the regular expression.

@dktapps
Copy link
Contributor Author

dktapps commented Oct 6, 2021

In the sample I gave, the 4 backslashes are intended to match 1 backslash (JSON \\\\ resolves to \\ when decoded, passed to PCRE, which then turns it into literal \). I'm just wondering if that might break something.

Also forward slashes must be escaped as they are used as delimiter of the regular expression.

preg_quote() might be better for that (specifically intended for purpose), though I don't suppose it matters much.

@dktapps
Copy link
Contributor Author

dktapps commented Oct 6, 2021

I just tested dev-master with my schema + data and it works.. I'll never understand this godawful mess of backslashes. Thank you for the fix.

@wol-soft
Copy link
Owner

wol-soft commented Oct 6, 2021

Yeah escaping is always hard, in this case on three layers. The JSON Schema, the PHP code and the PCRE engine. Glad it works now as expected.

@wol-soft
Copy link
Owner

wol-soft commented Oct 6, 2021

I've thought about it again and you are absolutely right.

Four backslashes in the JSON mean two backslashes in the regular expression due to JSON escaping. Two backslashes in the regular expression must match one backslash in the tested string due to PCRE escaping. In the current implementation four backslashes in the JSON match two backslashes in the tested string which is wrong. It's so obscure.

But the delimiter escaping must be kept 😄

Edit: changed in 7dbf81a

wol-soft added a commit that referenced this issue Oct 6, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants