-
Notifications
You must be signed in to change notification settings - Fork 103
single quote not properly escaped #51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Thanks for opening a detailed issue report! I've opened #46 which introduces unit tests for the expected escaping behaviour. Will need to compare your inputs against the expected behaviour to see if I can reproduce. Just for clarity, you wrote:
Do you mean version |
@susodapop yes, my bad, it's version 2.0.5 of the sql connector. Just so there can be no mistakes:
|
I agree there's a bug here. BackgroundThe ANSI SQL spec says that a single quote
The author of To behave properly, ImpactWhen Solution
|
Ahhh, well played! ANSI compliance is "opt in" I believe. Some months ago, I thought it would become "opt out", but that appears not to have been the case and I can no longer find evidence of this. However, it still doesn't appear that ANSI mode gives the escaping behavior you found documented, even though it does change other behaviors (note in the following examples, be sure to run the
So having done that exercise I doubly agree with your conclusion |
Not sure if anyone was actively working on this, but I just put up a PR to fix it if it's helpful! #56 |
Thanks @courtneyholcomb! We are working on a similar implementation internally but we hadn't incorporated an e2e test like you wrote. I'm going to keep your PR open so we can verify that our change will still pass your e2e test. If it doesn't, we can make sure that your use-case is still covered. I'll have more details within the next week! |
escaping a single quote with a second single quote. Signed-off-by: Jesse Whitehouse <[email protected]>
Signed-off-by: Jesse Whitehouse <[email protected]>
Signed-off-by: Jesse Whitehouse <[email protected]>
…provided Signed-off-by: Jesse Whitehouse <[email protected]>
What
ParamEscaper
'sescape_string()
gives incorrect behavior on Databricks SQL and in Databricks notebooks.It replaces a single quote
'
with''
, but the correct way to escape'
is with a backslash, like\'
.You can verify in PySpark with:
Note that because it starts as a Python literal, we need two backslashes
\\
to get Python to first escape\\'
to\'
and then Spark escapes to'
.I don't know what the motivation for this implementation was, but the result seems to be concatenation instead of escaping the quote character.
Reproduction in databricks-sql-python
The following demonstrates the issue in version
1.2.22.0.5 of databricks-sql-python against a serverless SQL warehouse in Azure, v2022.30, plus an implementation without parameter substitution showing an escape treatment that does work :The output is:
Expected results
String parameters with single quotes and backslashes should be properly reproduced:
"cat's meow"
would be escaped as"cat\\'s meow"
and the resulting SQL would returncat's meow
"cat\\'s meow"
would escape to"cat\\\\\\'s meow"
and the SQL would returncat\'s meow
Suggested fix
I'm not sure how this is usually implemented, but in my example just doing
param.replace('\\','\\\\' ).replace("'", "\\'")
at least preserves single quotes and backslashes, which are probably the most common cases. It would also leave alone escaped unicode literals like\U0001F44D
.How I encountered it
I'm using dbt with Databricks and noticed on upgrading from dbt-databricks 1.0 to 1.2.2 that single quotes started disappearing from our "seeds" (csv files loaded as Delta tables). Code had changed in dbt-databricks to use the new parameter binding functionality in this library, whereas (I assume) before it must have been injecting the values as literals into the SQL.
The text was updated successfully, but these errors were encountered: