Skip to content

Commit a4a6d9a

Browse files
author
Jesse Whitehouse
committed
Add documentation for execute_async.
Update changelog. Signed-off-by: Jesse Whitehouse <[email protected]>
1 parent e74c693 commit a4a6d9a

File tree

3 files changed

+99
-1
lines changed

3 files changed

+99
-1
lines changed

CHANGELOG.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,9 @@
11
# Release History
22

3+
## 3.1.0 (TBD)
4+
5+
- Add `execute_async` and `get_async_execution` methods to `Connection`
6+
37
## 3.0.1 (2023-12-01)
48

59
- Other: updated docstring comment about default parameterization approach (#287)

docs/execute_async.md

Lines changed: 94 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,94 @@
1+
# Run queries asynchronously
2+
3+
This driver supports running queries asynchronously using `execute_async()`. This method can return control to the calling code almost immediately, instead of blocking until the query completes. `execute_async()` never returns a `ResultSet`. Instead it returns a query handle that you can use to poll for the query status, cancel the query, or fetch the query results.
4+
5+
You can use this method to submit multiple queries in rapid succession and to pick up a running query executions that were kicked-off from a separate thread or `Connection`. This can be especially useful for recovering running executions within serverless functions.
6+
7+
**Note:** Asynchronous execution is not the same as _asyncio_ in Python.
8+
9+
10+
# Requirements
11+
12+
- `databricks-sql-connector>=3.1.0`
13+
14+
# Interface
15+
16+
To run a query asynchronously, use `databricks.sql.client.Connection.execute_async()`. This method takes the same arguments as `execute()`. To pick up an existing query run, use `databricks.sql.client.Connection.get_async_execution()`. Both methods return an `AsyncExecution` object, which lets you interact with a query by exposing these properties and methods:
17+
18+
**Properties**
19+
- `AsyncExecution.status` is the last-known status of this query run, expressed as an `AsyncExecutionStatus` enumeration value. For example, `RUNNING`, `FINISHED`, or `CANCELED`. When you first call `execute_async()` the resulting status will usually be `RUNNING`. Calling `sync_status()` will refresh the value of this property.
20+
- `AsyncExecution.query_id` is the `UUID` for this query run. You can use this to look-up the query in the Databricks SQL query history.
21+
- `AsyncExecution.query_secret` is the `UUID` secret for this query run. Both the `query_id` and `secret` are needed to fetch results for a running query. A query can be canceled using only its `query_id`.
22+
- `AsyncExecution.is_available` is a boolean that indicates whether this execution can be picked up from another `Connection` or thread. See [below](#note-about-direct-result-queries) for more information.
23+
24+
**Methods**
25+
- `AsyncExecution.sync_status()` performs a network round-trip to synchronize `.status`.
26+
- `AsyncExecution.get_results()` returns the `ResultSet` for this query run. This is the same return signature as a synchronous `.execute()` call. Note that if you call `get_results()` on a query that is still running, the code will block until the query finishes running and the result has been fetched.
27+
- `AsyncExecution.cancel()` performs a network round-trip to cancel this query execution.
28+
- `AsyncExceution.serialize()` returns a string of the `query_id:query_secret` for this execution.
29+
30+
**Note:** You do not need to directly instantiate an `AsyncExecution` object in your client code. Instead, use the `execute_async` and `get_async_execution` methods to run or pick up a running query.
31+
32+
# Code Examples
33+
34+
### Run a query asynchronously
35+
36+
This snippet mirrors the synchronous example in this repository's README.
37+
38+
```python
39+
import os
40+
import time
41+
from databricks import sql
42+
43+
host = os.getenv("DATABRICKS_HOST")
44+
http_path = os.getenv("DATABRICKS_HTTP_PATH")
45+
access_token = os.getenv("DATABRICKS_TOKEN")
46+
47+
with sql.connect(server_hostname=host, http_path=http_path, access_token=access_token) as connection:
48+
query = connection.execute_async("SELECT :param `p`, * FROM RANGE(10" {"param": "foo"})
49+
50+
# Poll for the result every 5 seconds
51+
while query.is_running:
52+
time.sleep(5)
53+
query.sync_status()
54+
55+
# this will raise a AsyncExecutionUnrecoverableResultException if the query was canceled
56+
result = query.get_results().fetchall()
57+
58+
```
59+
60+
### Pick up a running query
61+
62+
Both a `query_id` and `query_secret` are required to pick up a running query. This example runs a query from one `Connection` and fetches its result from a different connection.
63+
64+
```python
65+
import os
66+
import time
67+
from databricks import sql
68+
69+
host = os.getenv("DATABRICKS_HOST")
70+
http_path = os.getenv("DATABRICKS_HTTP_PATH")
71+
access_token = os.getenv("DATABRICKS_TOKEN")
72+
73+
with sql.connect(server_hostname=host, http_path=http_path, access_token=access_token) as connection:
74+
query = connection.execute_async("SELECT :param `p`, * FROM RANGE(10" {"param": "foo"})
75+
query_id_and_secret = query.serialize()
76+
77+
# The connection created above has now closed
78+
79+
with sql.connect(server_hostname=host, http_path=http_path, access_token=access_token) as connection:
80+
query_id, query_secret = query_id_and_secret.split(":")
81+
query = connection.get_async_execution(query_id, query_secret)
82+
83+
while query.is_running:
84+
time.sleep(5)
85+
query.sync_status()
86+
87+
result = query.get_results().fetchall()
88+
```
89+
90+
# Note about direct result queries
91+
92+
To minimise network roundtrips for small queries, Databricks will eagerly return a query result if the query completes within five seconds. This means that `execute_async()` may take up to five seconds to return control back to your calling code. When this happens, `AsyncExecution.is_available` will evaluate `False` and the query result will have already been cached in this `AsyncExecution` object. Calling `get_results()` will not invoke a network round-trip.
93+
94+
Queries that execute in this fashion cannot be picked up with `get_async_execution()` and their results are not persisted on the server to be fetched by a separate thread. Therefore, before calling `.serialize()` to persist a `query_id:query_secret` pair, you should check if `AsyncExecution.available == True` first.

tests/e2e/test_execute_async.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -162,7 +162,7 @@ def test_serialize(self, long_running_ae: AsyncExecution):
162162
assert ae.is_running
163163

164164
def test_get_async_execution_no_results_when_direct_results_were_sent(self):
165-
"""It remains to be seen whether results can be fetched repeatedly from a "picked up" execution."""
165+
"""When DirectResults are sent, they cannot be fetched from a separate thread."""
166166

167167
with self.connection() as conn:
168168
ae = conn.execute_async(DIRECT_RESULTS_QUERY, {"param": 1})

0 commit comments

Comments
 (0)