Skip to content

Commit 01a4d00

Browse files
committed
Merge branch 'sqlalchemy-dev' of https://github.com/overcoil/fork-databricks-sql-python into sqlalchemy-dev
2 parents 629a510 + b37807e commit 01a4d00

23 files changed

+1404
-56
lines changed
Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,23 @@
1+
<!-- We welcome contributions. All patches must include a sign-off. Please see CONTRIBUTING.md for details -->
2+
3+
4+
## What type of PR is this?
5+
<!-- Check all that apply, delete what doesn't apply. -->
6+
7+
- [ ] Refactor
8+
- [ ] Feature
9+
- [ ] Bug Fix
10+
- [ ] Other
11+
12+
## Description
13+
14+
## How is this tested?
15+
16+
- [ ] Unit tests
17+
- [ ] E2E Tests
18+
- [ ] Manually
19+
- [ ] N/A
20+
21+
<!-- If Manually, please describe. -->
22+
23+
## Related Tickets & Documents

.github/workflows/code-quality-checks.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
name: Code Quality Checks
22
on: [push]
33
jobs:
4-
run-tests:
4+
run-unit-tests:
55
runs-on: ubuntu-latest
66
steps:
77
#----------------------------------------------
@@ -48,7 +48,7 @@ jobs:
4848
# run test suite
4949
#----------------------------------------------
5050
- name: Run tests
51-
run: poetry run pytest tests/
51+
run: poetry run python -m pytest tests/unit
5252
check-linting:
5353
runs-on: ubuntu-latest
5454
steps:

.github/workflows/dco-check.yml

Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,25 @@
1+
name: DCO Check
2+
3+
on: [pull_request]
4+
5+
jobs:
6+
check:
7+
runs-on: ubuntu-latest
8+
steps:
9+
- name: Check for DCO
10+
id: dco-check
11+
uses: tisonkun/[email protected]
12+
- name: Comment about DCO status
13+
uses: actions/github-script@v6
14+
if: ${{ failure() }}
15+
with:
16+
script: |
17+
github.rest.issues.createComment({
18+
issue_number: context.issue.number,
19+
owner: context.repo.owner,
20+
repo: context.repo.repo,
21+
body: `Thanks for your contribution! To satisfy the DCO policy in our \
22+
[contributing guide](https://github.com/databricks/databricks-sql-python/blob/main/CONTRIBUTING.md) \
23+
every commit message must include a sign-off message. One or more of your commits is missing this message. \
24+
You can reword previous commit messages with an interactive rebase (\`git rebase -i main\`).`
25+
})

CONTRIBUTING.md

Lines changed: 105 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,73 @@
1-
# Contributing
1+
# Contributing Guide
22

3-
To contribute to this repository, fork it and send pull requests.
3+
We happily welcome contributions to the `databricks-sql-connector` package. We use [GitHub Issues](https://github.com/databricks/databricks-sql-python/issues) to track community reported issues and [GitHub Pull Requests](https://github.com/databricks/databricks-sql-python/pulls) for accepting changes.
4+
5+
Contributions are licensed on a license-in/license-out basis.
6+
7+
## Communication
8+
Before starting work on a major feature, please reach out to us via GitHub, Slack, email, etc. We will make sure no one else is already working on it and ask you to open a GitHub issue.
9+
A "major feature" is defined as any change that is > 100 LOC altered (not including tests), or changes any user-facing behavior.
10+
We will use the GitHub issue to discuss the feature and come to agreement.
11+
This is to prevent your time being wasted, as well as ours.
12+
The GitHub review process for major features is also important so that organizations with commit access can come to agreement on design.
13+
If it is appropriate to write a design document, the document must be hosted either in the GitHub tracking issue, or linked to from the issue and hosted in a world-readable location.
14+
Specifically, if the goal is to add a new extension, please read the extension policy.
15+
Small patches and bug fixes don't need prior communication.
16+
17+
## Coding Style
18+
We follow [PEP 8](https://www.python.org/dev/peps/pep-0008/) with one exception: lines can be up to 100 characters in length, not 79.
19+
20+
## Sign your work
21+
The sign-off is a simple line at the end of the explanation for the patch. Your signature certifies that you wrote the patch or otherwise have the right to pass it on as an open-source patch. The rules are pretty simple: if you can certify the below (from developercertificate.org):
22+
23+
```
24+
Developer Certificate of Origin
25+
Version 1.1
26+
27+
Copyright (C) 2004, 2006 The Linux Foundation and its contributors.
28+
1 Letterman Drive
29+
Suite D4700
30+
San Francisco, CA, 94129
31+
32+
Everyone is permitted to copy and distribute verbatim copies of this
33+
license document, but changing it is not allowed.
34+
35+
36+
Developer's Certificate of Origin 1.1
37+
38+
By making a contribution to this project, I certify that:
39+
40+
(a) The contribution was created in whole or in part by me and I
41+
have the right to submit it under the open source license
42+
indicated in the file; or
43+
44+
(b) The contribution is based upon previous work that, to the best
45+
of my knowledge, is covered under an appropriate open source
46+
license and I have the right under that license to submit that
47+
work with modifications, whether created in whole or in part
48+
by me, under the same open source license (unless I am
49+
permitted to submit under a different license), as indicated
50+
in the file; or
51+
52+
(c) The contribution was provided directly to me by some other
53+
person who certified (a), (b) or (c) and I have not modified
54+
it.
55+
56+
(d) I understand and agree that this project and the contribution
57+
are public and that a record of the contribution (including all
58+
personal information I submit with it, including my sign-off) is
59+
maintained indefinitely and may be redistributed consistent with
60+
this project or the open source license(s) involved.
61+
```
62+
63+
Then you just add a line to every git commit message:
64+
65+
```
66+
Signed-off-by: Joe Smith <[email protected]>
67+
Use your real name (sorry, no pseudonyms or anonymous contributions.)
68+
```
69+
70+
If you set your `user.name` and `user.email` git configs, you can sign your commit automatically with `git commit -s`.
471

572
## Set up your environment
673

@@ -9,40 +76,61 @@ This project uses [Poetry](https://python-poetry.org/) for dependency management
976
1. Clone this respository
1077
2. Run `poetry install`
1178

12-
### Unit Tests
79+
### Run tests
80+
81+
We use [Pytest](https://docs.pytest.org/en/7.1.x/) as our test runner. Invoke it with `poetry run python -m pytest`, all other arguments are passed directly to `pytest`.
1382

14-
We use [Pytest](https://docs.pytest.org/en/7.1.x/) as our test runner. Invoke it with `poetry run pytest`, all other arguments are passed directly to `pytest`.
83+
#### Unit tests
84+
85+
Unit tests do not require a Databricks account.
1586

16-
#### All tests
1787
```bash
18-
poetry run pytest tests
88+
poetry run python -m pytest tests/unit
1989
```
20-
2190
#### Only a specific test file
2291

2392
```bash
24-
poetry run pytest tests/tests.py
93+
poetry run python -m pytest tests/unit/tests.py
2594
```
2695

2796
#### Only a specific method
2897

2998
```bash
30-
poetry run pytest tests/tests.py::ClientTestSuite::test_closing_connection_closes_commands
99+
poetry run python -m pytest tests/unit/tests.py::ClientTestSuite::test_closing_connection_closes_commands
100+
```
101+
102+
#### e2e Tests
103+
104+
End-to-end tests require a Databricks account. Before you can run them, you must set connection details for a Databricks SQL endpoint in your environment:
105+
106+
```bash
107+
export host=""
108+
export http_path=""
109+
export access_token=""
110+
```
111+
112+
There are several e2e test suites available:
113+
- `PySQLCoreTestSuite`
114+
- `PySQLLargeQueriesSuite`
115+
- `PySQLRetryTestSuite.HTTP503Suite` **[not documented]**
116+
- `PySQLRetryTestSuite.HTTP429Suite` **[not documented]**
117+
- `PySQLUnityCatalogTestSuite` **[not documented]**
118+
119+
To execute the core test suite:
120+
121+
```bash
122+
poetry run python -m pytest tests/e2e/driver_tests.py::PySQLCoreTestSuite
31123
```
32124

125+
The suites marked `[not documented]` require additional configuration which will be documented at a later time.
33126
### Code formatting
34127

35128
This project uses [Black](https://pypi.org/project/black/).
36129

37130
```
38-
poetry run black src
131+
poetry run python3 -m black src --check
39132
```
40-
## Pull Request Process
41133

42-
1. Update the [CHANGELOG.md](README.md) or similar documentation with details of changes you wish to make, if applicable.
43-
2. Add any appropriate tests.
44-
3. Make your code or other changes.
45-
4. Review guidelines such as
46-
[How to write the perfect pull request][github-perfect-pr], thanks!
134+
Remove the `--check` flag to write reformatted files to disk.
47135

48-
[github-perfect-pr]: https://blog.github.com/2015-01-21-how-to-write-the-perfect-pull-request/
136+
To simplify reviews you can format your changes in a separate commit.

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ You are welcome to file an issue here for general use cases. You can also contac
1111

1212
## Requirements
1313

14-
Python 3.7 or above is required.
14+
A development machine running Python >=3.7, <3.10.
1515

1616
## Documentation
1717

pyproject.toml

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,9 @@ PyHive = "^0.6.5"
1818
[tool.poetry.plugins."sqlalchemy.dialects"]
1919
"databricks.thrift" = "databricks.sqlalchemy:DatabricksDialect"
2020

21+
[tool.poetry.plugins."sqlalchemy.dialects"]
22+
"databricks.thrift" = "databricks.sqlalchemy:DatabricksDialect"
23+
2124
[tool.poetry.dev-dependencies]
2225
pytest = "^7.1.2"
2326
mypy = "^0.950"

src/databricks/sql/thrift_backend.py

Lines changed: 61 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,5 @@
11
from decimal import Decimal
2+
import errno
23
import logging
34
import math
45
import time
@@ -15,6 +16,9 @@
1516

1617
from databricks.sql.thrift_api.TCLIService import TCLIService, ttypes
1718
from databricks.sql import *
19+
from databricks.sql.thrift_api.TCLIService.TCLIService import (
20+
Client as TCLIServiceClient,
21+
)
1822
from databricks.sql.utils import (
1923
ArrowQueue,
2024
ExecuteResponse,
@@ -39,6 +43,7 @@
3943
"_retry_delay_max": (float, 60, 5, 3600),
4044
"_retry_stop_after_attempts_count": (int, 30, 1, 60),
4145
"_retry_stop_after_attempts_duration": (float, 900, 1, 86400),
46+
"_retry_delay_default": (float, 5, 1, 60),
4247
}
4348

4449

@@ -71,6 +76,8 @@ def __init__(
7176
# _retry_delay_min (default: 1)
7277
# _retry_delay_max (default: 60)
7378
# {min,max} pre-retry delay bounds
79+
# _retry_delay_default (default: 5)
80+
# Only used when GetOperationStatus fails due to a TCP/OS Error.
7481
# _retry_stop_after_attempts_count (default: 30)
7582
# total max attempts during retry sequence
7683
# _retry_stop_after_attempts_duration (default: 900)
@@ -158,7 +165,7 @@ def _initialize_retry_args(self, kwargs):
158165
"retry parameter: {} given_or_default {}".format(key, given_or_default)
159166
)
160167
if bound != given_or_default:
161-
logger.warn(
168+
logger.warning(
162169
"Override out of policy retry parameter: "
163170
+ "{} given {}, restricted to {}".format(
164171
key, given_or_default, bound
@@ -243,7 +250,9 @@ def _handle_request_error(self, error_info, attempt, elapsed):
243250
# FUTURE: Consider moving to https://github.com/litl/backoff or
244251
# https://github.com/jd/tenacity for retry logic.
245252
def make_request(self, method, request):
246-
"""Execute given request, attempting retries when receiving HTTP 429/503.
253+
"""Execute given request, attempting retries when
254+
1. Receiving HTTP 429/503 from server
255+
2. OSError is raised during a GetOperationStatus
247256
248257
For delay between attempts, honor the given Retry-After header, but with bounds.
249258
Use lower bound of expontial-backoff based on _retry_delay_min,
@@ -260,17 +269,21 @@ def make_request(self, method, request):
260269
def get_elapsed():
261270
return time.time() - t0
262271

272+
def bound_retry_delay(attempt, proposed_delay):
273+
"""bound delay (seconds) by [min_delay*1.5^(attempt-1), max_delay]"""
274+
delay = int(proposed_delay)
275+
delay = max(delay, self._retry_delay_min * math.pow(1.5, attempt - 1))
276+
delay = min(delay, self._retry_delay_max)
277+
return delay
278+
263279
def extract_retry_delay(attempt):
264280
# encapsulate retry checks, returns None || delay-in-secs
265281
# Retry IFF 429/503 code + Retry-After header set
266282
http_code = getattr(self._transport, "code", None)
267283
retry_after = getattr(self._transport, "headers", {}).get("Retry-After")
268284
if http_code in [429, 503] and retry_after:
269285
# bound delay (seconds) by [min_delay*1.5^(attempt-1), max_delay]
270-
delay = int(retry_after)
271-
delay = max(delay, self._retry_delay_min * math.pow(1.5, attempt - 1))
272-
delay = min(delay, self._retry_delay_max)
273-
return delay
286+
return bound_retry_delay(attempt, int(retry_after))
274287
return None
275288

276289
def attempt_request(attempt):
@@ -279,24 +292,57 @@ def attempt_request(attempt):
279292
# - non-None method_return -> success, return and be done
280293
# - non-None retry_delay -> sleep delay before retry
281294
# - error, error_message always set when available
295+
296+
error, error_message, retry_delay = None, None, None
282297
try:
283298
logger.debug("Sending request: {}".format(request))
284299
response = method(request)
285300
logger.debug("Received response: {}".format(response))
286301
return response
287-
except Exception as error:
302+
except OSError as err:
303+
error = err
304+
error_message = str(err)
305+
306+
gos_name = TCLIServiceClient.GetOperationStatus.__name__
307+
if method.__name__ == gos_name:
308+
retry_delay = bound_retry_delay(attempt, self._retry_delay_default)
309+
310+
# fmt: off
311+
# The built-in errno package encapsulates OSError codes, which are OS-specific.
312+
# log.info for errors we believe are not unusual or unexpected. log.warn for
313+
# for others like EEXIST, EBADF, ERANGE which are not expected in this context.
314+
#
315+
# I manually tested this retry behaviour using mitmweb and confirmed that
316+
# GetOperationStatus requests are retried when I forced network connection
317+
# interruptions / timeouts / reconnects. See #24 for more info.
318+
# | Debian | Darwin |
319+
info_errs = [ # |--------|--------|
320+
errno.ESHUTDOWN, # | 32 | 32 |
321+
errno.EAFNOSUPPORT, # | 97 | 47 |
322+
errno.ECONNRESET, # | 104 | 54 |
323+
errno.ETIMEDOUT, # | 110 | 60 |
324+
]
325+
326+
# fmt: on
327+
log_string = f"{gos_name} failed with code {err.errno} and will attempt to retry"
328+
if err.errno in info_errs:
329+
logger.info(log_string)
330+
else:
331+
logger.warning(log_string)
332+
except Exception as err:
333+
error = err
288334
retry_delay = extract_retry_delay(attempt)
289335
error_message = ThriftBackend._extract_error_message_from_headers(
290336
getattr(self._transport, "headers", {})
291337
)
292-
return RequestErrorInfo(
293-
error=error,
294-
error_message=error_message,
295-
retry_delay=retry_delay,
296-
http_code=getattr(self._transport, "code", None),
297-
method=method.__name__,
298-
request=request,
299-
)
338+
return RequestErrorInfo(
339+
error=error,
340+
error_message=error_message,
341+
retry_delay=retry_delay,
342+
http_code=getattr(self._transport, "code", None),
343+
method=method.__name__,
344+
request=request,
345+
)
300346

301347
# The real work:
302348
# - for each available attempt:

0 commit comments

Comments
 (0)