Provide actionable feedback when internal operations are failing rather than silently retrying for over 10mins #346

dhirschfeld · 2024-02-04T03:06:51Z

I'm trying to debug a problem where my request appears to hang indefinitely.

If I pause the program in the debugger it's deep in a retry loop in urllib

>>> response
<urllib3.response.HTTPResponse at 0x187b9786ec0>

>>> response.status
403

...and it seems the default timeout for requests is 15mins

>>> self
<databricks.sql.auth.thrift_http_client.THttpClient at 0x187b97858a0>

>>> self._THttpClient__timeout
900.0

The text was updated successfully, but these errors were encountered:

dhirschfeld · 2024-02-04T03:15:25Z

To all intents and purposes the library has hung - there is no feedback that the operation is failing and being retried multiple times or any indication of why it is failing.

Maybe this issue is more of a request for better feedback in the case of failing HTTP operations.

dhirschfeld · 2024-02-04T09:48:18Z

The 403 is pretty fishy.

Debugging a bit further I can see the token being passed in the headers:

>>> headers
{'Content-Type': 'application/x-thrift',
 'Content-Length': '167',
 'User-Agent': 'PyDatabricksSqlConnector/3.0.1 (sqlalchemy + )',
 'Authorization': 'Bearer <snip>d6b6'}

The token being passed is the same as my default environment variable:

❯ echo $env:DATABRICKS_TOKEN
<snip>d6b6

BUT in my calling script I explicitly set the correct token to use:

os.environ['DATABRICKS_TOKEN'] = '<snip>60cd'

...so it seems the token is somehow not being picked up?

susodapop · 2024-02-08T14:04:47Z

Do you know what kind of Thrift request is being sent that results in this 403 failing with the 403? Is it an OpenSession?

there is no feedback that the operation is failing and being retried multiple times or any indication of why it is failing.

Is this true even when you have your log level set at DEBUG? urllib3 and databricks-sql-connector both emit debug messages when they determine a request is retry-able.

Adding this to your script will help with the repro:

import logging

logging.getLogger("databricks.sql").setLevel(logging.DEBUG)
logging.getLogger("urllib3").setLevel(logging.DEBUG)

The 403 is pretty fishy.

Yeah, generally a 403 should halt everything since authentication errors are by-nature not recoverable. The DBX team should investigate. Likely the fix will be an additional conditional check added to the should_retry() method in DatabricksRetryPolicy.

dhirschfeld · 2024-02-23T00:58:38Z

Circling back since I just hit this again. If I configure logging like:

import logging

logging.basicConfig(level='DEBUG')
logging.getLogger("databricks.sql").setLevel(logging.DEBUG)
logging.getLogger("urllib3").setLevel(logging.DEBUG)

I do get some feedback! 🎉

DEBUG:databricks.sql.thrift_backend:retry parameter: _retry_delay_min given_or_default 1.0
DEBUG:databricks.sql.thrift_backend:retry parameter: _retry_delay_max given_or_default 60.0
DEBUG:databricks.sql.thrift_backend:retry parameter: _retry_stop_after_attempts_count given_or_default 30
DEBUG:databricks.sql.thrift_backend:retry parameter: _retry_stop_after_attempts_duration given_or_default 900.0
DEBUG:databricks.sql.thrift_backend:retry parameter: _retry_delay_default given_or_default 5.0
DEBUG:databricks.sql.thrift_backend:Sending request: OpenSession(<REDACTED>)
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): test.cloud.databricks.com:443
DEBUG:urllib3.connectionpool:https://test.cloud.databricks.com:443 "POST /sql/1.0/warehouses/deadbeef HTTP/1.1" 403 54
DEBUG:databricks.sql.auth.retry:This request should be retried: Other
DEBUG:databricks.sql.auth.retry:Failed requests are retried by default per configured DatabricksRetryPolicy
DEBUG:urllib3.util.retry:Incremented Retry for (url='/sql/1.0/warehouses/deadbeef'): DatabricksRetryPolicy(total=29, connect=None, read=None, redirect=None, status=None)
DEBUG:urllib3.connectionpool:Retry: /sql/1.0/warehouses/deadbeef
DEBUG:urllib3.connectionpool:https://test.cloud.databricks.com:443 "POST /sql/1.0/warehouses/deadbeef HTTP/1.1" 403 54
DEBUG:databricks.sql.auth.retry:This request should be retried: Other
DEBUG:databricks.sql.auth.retry:Failed requests are retried by default per configured DatabricksRetryPolicy
DEBUG:urllib3.util.retry:Incremented Retry for (url='/sql/1.0/warehouses/deadbeef'): DatabricksRetryPolicy(total=28, connect=None, read=None, redirect=None, status=None)
DEBUG:urllib3.connectionpool:Retry: /sql/1.0/warehouses/deadbeef
DEBUG:urllib3.connectionpool:https://test.cloud.databricks.com:443 "POST /sql/1.0/warehouses/deadbeef HTTP/1.1" 403 54
DEBUG:databricks.sql.auth.retry:This request should be retried: Other
DEBUG:databricks.sql.auth.retry:Failed requests are retried by default per configured DatabricksRetryPolicy
DEBUG:urllib3.util.retry:Incremented Retry for (url='/sql/1.0/warehouses/deadbeef'): DatabricksRetryPolicy(total=27, connect=None, read=None, redirect=None, status=None)
DEBUG:urllib3.connectionpool:Retry: /sql/1.0/warehouses/deadbeef
DEBUG:urllib3.connectionpool:https://test.cloud.databricks.com:443 "POST /sql/1.0/warehouses/deadbeef HTTP/1.1" 403 54
DEBUG:databricks.sql.auth.retry:This request should be retried: Other
DEBUG:databricks.sql.auth.retry:Failed requests are retried by default per configured DatabricksRetryPolicy
DEBUG:urllib3.util.retry:Incremented Retry for (url='/sql/1.0/warehouses/deadbeef'): DatabricksRetryPolicy(total=26, connect=None, read=None, redirect=None, status=None)
DEBUG:urllib3.connectionpool:Retry: /sql/1.0/warehouses/deadbeef
DEBUG:urllib3.connectionpool:https://test.cloud.databricks.com:443 "POST /sql/1.0/warehouses/deadbeef HTTP/1.1" 403 54
DEBUG:databricks.sql.auth.retry:This request should be retried: Other
DEBUG:databricks.sql.auth.retry:Failed requests are retried by default per configured DatabricksRetryPolicy
DEBUG:urllib3.util.retry:Incremented Retry for (url='/sql/1.0/warehouses/deadbeef'): DatabricksRetryPolicy(total=25, connect=None, read=None, redirect=None, status=None)
DEBUG:urllib3.connectionpool:Retry: /sql/1.0/warehouses/deadbeef
DEBUG:urllib3.connectionpool:https://test.cloud.databricks.com:443 "POST /sql/1.0/warehouses/deadbeef HTTP/1.1" 403 54
DEBUG:databricks.sql.auth.retry:This request should be retried: Other
DEBUG:databricks.sql.auth.retry:Failed requests are retried by default per configured DatabricksRetryPolicy
DEBUG:urllib3.util.retry:Incremented Retry for (url='/sql/1.0/warehouses/deadbeef'): DatabricksRetryPolicy(total=24, connect=None, read=None, redirect=None, status=None)
DEBUG:urllib3.connectionpool:Retry: /sql/1.0/warehouses/deadbeef
DEBUG:urllib3.connectionpool:https://test.cloud.databricks.com:443 "POST /sql/1.0/warehouses/deadbeef HTTP/1.1" 403 54
DEBUG:databricks.sql.auth.retry:This request should be retried: Other
DEBUG:databricks.sql.auth.retry:Failed requests are retried by default per configured DatabricksRetryPolicy
DEBUG:urllib3.util.retry:Incremented Retry for (url='/sql/1.0/warehouses/deadbeef'): DatabricksRetryPolicy(total=23, connect=None, read=None, redirect=None, status=None)
<ctrl-c>

dhirschfeld · 2024-02-23T01:04:10Z

I do get some feedback! 🎉

however, I don't think users should have to (know how to) configure logging to get critical usage information from the library. The library should not appear to hang indefinitely without providing any feedback/information to the user.

In this particular case, I don't think the library should be responsible for retries at all - just raise an exception and let the user decide how best to handle the error. Certainly, indefinitely retrying a 403 isn't going to lead to success.

andrew-christianson · 2024-02-28T00:50:08Z

FWIW, chiming it to say that I hit the same issue (invalid token), can see exactly the same output from urllib3 (after setting the log level appropriatley) and was similarly stymied for an hour by the apparently-hung process. Failing loudly on 403 seems like the right solution here.

dhirschfeld changed the title ~~Set appropriate timeouts~~ Provide actionable feedback when internal operations are failing rather than silently retrying for over 10mins Feb 4, 2024

mbelak-dtml mentioned this issue Mar 15, 2024

databricks.sql.connect hangs in a long retrying loop when an invalid access token is used #372

Closed

susodapop mentioned this issue Mar 15, 2024

Don't retry network requests that fail with code 403 #373

Merged

benc-db closed this as completed Mar 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Provide actionable feedback when internal operations are failing rather than silently retrying for over 10mins #346

Provide actionable feedback when internal operations are failing rather than silently retrying for over 10mins #346

dhirschfeld commented Feb 4, 2024

dhirschfeld commented Feb 4, 2024

dhirschfeld commented Feb 4, 2024

susodapop commented Feb 8, 2024

dhirschfeld commented Feb 23, 2024

dhirschfeld commented Feb 23, 2024

andrew-christianson commented Feb 28, 2024 •

edited

Loading

Provide actionable feedback when internal operations are failing rather than silently retrying for over 10mins #346

Provide actionable feedback when internal operations are failing rather than silently retrying for over 10mins #346

Comments

dhirschfeld commented Feb 4, 2024

dhirschfeld commented Feb 4, 2024

dhirschfeld commented Feb 4, 2024

susodapop commented Feb 8, 2024

dhirschfeld commented Feb 23, 2024

dhirschfeld commented Feb 23, 2024

andrew-christianson commented Feb 28, 2024 • edited Loading

andrew-christianson commented Feb 28, 2024 •

edited

Loading