Skip to content

sqlalchemy_core patch errors for unencoded special characters in db url #416

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jachien opened this issue Nov 20, 2023 · 1 comment
Closed
Assignees

Comments

@jachien
Copy link

jachien commented Nov 20, 2023

Problem

There is an open sqlalchemy bug where the sqlalchemy.engine.url.URL class does not correctly url encode username and passwords when converting to a string.

This behavior can cause the xray patching of sqlalchemy Session and Connection execute functions to error when trying to urlparse the db url returned by sqlalchemy.

In my case, a closing square bracket ] in my password leads urlparse to interpret my db url as an invalid IPv6 address, but this could probably manifest as other errors depending on the unencoded characters.

Error parsing sql metadata.
Traceback (most recent call last):
  File "path/to/aws_xray_sdk/ext/sqlalchemy_core/patch.py", line 22, in _sql_meta
    url = urlparse(str(engine_instance.engine.url))
  File "path/to/urllib/parse.py", line 384, in urlparse
    splitresult = urlsplit(url, scheme, allow_fragments)
  File "path/to/urllib/parse.py", line 486, in urlsplit
    raise ValueError("Invalid IPv6 URL")
ValueError: Invalid IPv6 URL

Note: stack trace is for aws-xray-sdk 2.8.0, the linked code above is for current master head.

What I expected

No error logged from xray and full trace functionality.

My setup

aws-xray-sdk 2.8.0
Aurora Postgres 14 (but can affect any database)
pg8000 1.20.0 (but can affect any dialect)
Python 3.8

Steps to reproduce

CREATE DATABASE test_xray;
CREATE USER test_xray_user WITH ENCRYPTED PASSWORD 'test]password';
GRANT ALL PRIVILEGES ON DATABASE test_xray TO test_xray_user;
from sqlalchemy import create_engine
import urllib

from aws_xray_sdk.core import patch_all, xray_recorder
patch_all()

password = urllib.parse.quote_plus("test]password")
db_url = f"postgresql+pg8000://test_xray_user:{password}@myhost:5432/test_xray"

print(db_url)
# postgresql+pg8000://test_xray_user:test%5Dpassword@myhost:5432/test_xray

engine = create_engine(db_url)

# this is the upstream bug in sqlalchemy
print(engine.url)
# postgresql+pg8000://test_xray_user:test]password@myhost:5432/test_xray

with xray_recorder.in_segment("test_segment") as segment:
    with engine.connect() as connection:
        connection.execute("select 1")

This leads to the error in the Problem section being logged.

Current workaround

Avoid square brackets in passwords. There may be other characters that cause issues.

For RDS passwords managed by secrets manager, this can be configured in CDK using excludeCharacters for DatabaseCluster credentials, DatabaseSecrets, rotation definitions, etc.

Proposed fix -- starting point

The most correct fix is for the sqlalchemy bug to be fixed, but the maintainers stated on the linked bug that a fix will not be available for months.

This is a starting point for a fix that will be continue to work correctly even after the sqlalchemy bug is fixed upstream. It is NOT thoroughly tested.

$ git diff
diff --git a/aws_xray_sdk/ext/sqlalchemy_core/patch.py b/aws_xray_sdk/ext/sqlalchemy_core/patch.py
index 0551fe2..58df905 100644
--- a/aws_xray_sdk/ext/sqlalchemy_core/patch.py
+++ b/aws_xray_sdk/ext/sqlalchemy_core/patch.py
@@ -13,19 +13,21 @@ from aws_xray_sdk.ext.util import unwrap
 
 def _sql_meta(engine_instance, args):
     try:
-        metadata = {}
-        url = urlparse(str(engine_instance.engine.url))
+        # Workaround for https://github.com/sqlalchemy/sqlalchemy/issues/10662
+        # sqlalchemy.engine.url.URL's __repr__ does not url encode username nor password.
+        # This will continue to work once sqlalchemy fixes the bug.
+        sa_url = engine_instance.engine.url
+        username = sa_url.username
+        sa_url = sa_url._replace(username=None, password=None)
+        url = urlparse(str(sa_url))
+        name = url.netloc
+        if username:
+            # Restore url encoded username
+            quoted_username = urllib.parse.quote_plus(username)
+            url = url._replace(netloc='{}@{}'.format(quoted_username, url.netloc))
         # Add Scheme to uses_netloc or // will be missing from url.
         uses_netloc.append(url.scheme)
-        if url.password is None:
-            metadata['url'] = url.geturl()
-            name = url.netloc
-        else:
-            # Strip password from URL
-            host_info = url.netloc.rpartition('@')[-1]
-            parts = url._replace(netloc='{}@{}'.format(url.username, host_info))
-            metadata['url'] = parts.geturl()
-            name = host_info
+        metadata['url'] = url.geturl()
         metadata['user'] = url.username
         metadata['database_type'] = engine_instance.engine.name
         try:

@jachien jachien changed the title sqlalchemy_core patch attempts to parse malformed db urls sqlalchemy_core patch errors for unencoded special characters in db url Nov 20, 2023
vastin added a commit to vastin/aws-xray-sdk-python that referenced this issue Jan 29, 2024
@vastin vastin mentioned this issue Jan 29, 2024
vastin added a commit to vastin/aws-xray-sdk-python that referenced this issue Mar 1, 2024
vastin added a commit that referenced this issue Mar 1, 2024
…db url #416. (#418)

* Fix unit test and integration test
* Fix issues #416
@vastin
Copy link
Contributor

vastin commented Mar 6, 2024

Fixed in release 2.13.0.

@vastin vastin closed this as completed Mar 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants