|
| 1 | +# Run queries asynchronously |
| 2 | + |
| 3 | +This driver supports running queries asynchronously using `execute_async()`. This method can return control to the calling code almost immediately, instead of blocking until the query completes. `execute_async()` never returns a `ResultSet`. Instead it returns a query handle that you can use to poll for the query status, cancel the query, or fetch the query results. |
| 4 | + |
| 5 | +You can use this method to submit multiple queries in rapid succession and to pick up a running query executions that were kicked-off from a separate thread or `Connection`. This can be especially useful for recovering running executions within serverless functions. |
| 6 | + |
| 7 | +**Note:** Asynchronous execution is not the same as _asyncio_ in Python. |
| 8 | + |
| 9 | + |
| 10 | +# Requirements |
| 11 | + |
| 12 | +- `databricks-sql-connector>=3.1.0` |
| 13 | + |
| 14 | +# Interface |
| 15 | + |
| 16 | +To run a query asynchronously, use `databricks.sql.client.Connection.execute_async()`. This method takes the same arguments as `execute()`. To pick up an existing query run, use `databricks.sql.client.Connection.get_async_execution()`. Both methods return an `AsyncExecution` object, which lets you interact with a query by exposing these properties and methods: |
| 17 | + |
| 18 | +**Properties** |
| 19 | +- `AsyncExecution.status` is the last-known status of this query run, expressed as an `AsyncExecutionStatus` enumeration value. For example, `RUNNING`, `FINISHED`, or `CANCELED`. When you first call `execute_async()` the resulting status will usually be `RUNNING`. Calling `sync_status()` will refresh the value of this property. |
| 20 | +- `AsyncExecution.query_id` is the `UUID` for this query run. You can use this to look-up the query in the Databricks SQL query history. |
| 21 | +- `AsyncExecution.query_secret` is the `UUID` secret for this query run. Both the `query_id` and `secret` are needed to fetch results for a running query. A query can be canceled using only its `query_id`. |
| 22 | +- `AsyncExecution.is_available` is a boolean that indicates whether this execution can be picked up from another `Connection` or thread. See [below](#note-about-direct-result-queries) for more information. |
| 23 | + |
| 24 | +**Methods** |
| 25 | +- `AsyncExecution.sync_status()` performs a network round-trip to synchronize `.status`. |
| 26 | +- `AsyncExecution.get_results()` returns the `ResultSet` for this query run. This is the same return signature as a synchronous `.execute()` call. Note that if you call `get_results()` on a query that is still running, the code will block until the query finishes running and the result has been fetched. |
| 27 | +- `AsyncExecution.cancel()` performs a network round-trip to cancel this query execution. |
| 28 | +- `AsyncExceution.serialize()` returns a string of the `query_id:query_secret` for this execution. |
| 29 | + |
| 30 | +**Note:** You do not need to directly instantiate an `AsyncExecution` object in your client code. Instead, use the `execute_async` and `get_async_execution` methods to run or pick up a running query. |
| 31 | + |
| 32 | +# Code Examples |
| 33 | + |
| 34 | +### Run a query asynchronously |
| 35 | + |
| 36 | +This snippet mirrors the synchronous example in this repository's README. |
| 37 | + |
| 38 | +```python |
| 39 | +import os |
| 40 | +import time |
| 41 | +from databricks import sql |
| 42 | + |
| 43 | +host = os.getenv("DATABRICKS_HOST") |
| 44 | +http_path = os.getenv("DATABRICKS_HTTP_PATH") |
| 45 | +access_token = os.getenv("DATABRICKS_TOKEN") |
| 46 | + |
| 47 | +with sql.connect(server_hostname=host, http_path=http_path, access_token=access_token) as connection: |
| 48 | + query = connection.execute_async("SELECT :param `p`, * FROM RANGE(10" {"param": "foo"}) |
| 49 | + |
| 50 | + # Poll for the result every 5 seconds |
| 51 | + while query.is_running: |
| 52 | + time.sleep(5) |
| 53 | + query.sync_status() |
| 54 | + |
| 55 | + # this will raise a AsyncExecutionUnrecoverableResultException if the query was canceled |
| 56 | + result = query.get_results().fetchall() |
| 57 | + |
| 58 | +``` |
| 59 | + |
| 60 | +### Pick up a running query |
| 61 | + |
| 62 | +Both a `query_id` and `query_secret` are required to pick up a running query. This example runs a query from one `Connection` and fetches its result from a different connection. |
| 63 | + |
| 64 | +```python |
| 65 | +import os |
| 66 | +import time |
| 67 | +from databricks import sql |
| 68 | + |
| 69 | +host = os.getenv("DATABRICKS_HOST") |
| 70 | +http_path = os.getenv("DATABRICKS_HTTP_PATH") |
| 71 | +access_token = os.getenv("DATABRICKS_TOKEN") |
| 72 | + |
| 73 | +with sql.connect(server_hostname=host, http_path=http_path, access_token=access_token) as connection: |
| 74 | + query = connection.execute_async("SELECT :param `p`, * FROM RANGE(10" {"param": "foo"}) |
| 75 | + query_id_and_secret = query.serialize() |
| 76 | + |
| 77 | +# The connection created above has now closed |
| 78 | + |
| 79 | +with sql.connect(server_hostname=host, http_path=http_path, access_token=access_token) as connection: |
| 80 | + query_id, query_secret = query_id_and_secret.split(":") |
| 81 | + query = connection.get_async_execution(query_id, query_secret) |
| 82 | + |
| 83 | + while query.is_running: |
| 84 | + time.sleep(5) |
| 85 | + query.sync_status() |
| 86 | + |
| 87 | + result = query.get_results().fetchall() |
| 88 | +``` |
| 89 | + |
| 90 | +# Note about direct result queries |
| 91 | + |
| 92 | +To minimise network roundtrips for small queries, Databricks will eagerly return a query result if the query completes within five seconds. This means that `execute_async()` may take up to five seconds to return control back to your calling code. When this happens, `AsyncExecution.is_available` will evaluate `False` and the query result will have already been cached in this `AsyncExecution` object. Calling `get_results()` will not invoke a network round-trip. |
| 93 | + |
| 94 | +Queries that execute in this fashion cannot be picked up with `get_async_execution()` and their results are not persisted on the server to be fetched by a separate thread. Therefore, before calling `.serialize()` to persist a `query_id:query_secret` pair, you should check if `AsyncExecution.available == True` first. |
0 commit comments