[PECO-1263] Add get_async_execution method #314

susodapop · 2024-01-09T01:15:39Z

Description

This PR adds a way to "pick up" a running execution using only its query_id and query_secret. The interface looks like this:

with self.connection() as conn:
    query_id = "01eeae6b-xxxx-1513-89a3-4668a032ed77"
    query_secret = "01eeae6b-xxxx-179d-8faf-0f39dc3788f8"
    ae = conn.get_async_execution(query_id, query_secret)

To achieve this I had to update AsyncExecution's __init__ method to accept None as its initial status value. Prior to this, AsyncExecution was always created with the from_thrift_response method which already knows the initial value.

I also had to introduce a new FakeExecuteStatementResponse dataclass. This is a workaround for the way thrift_backend.py is currently written. thrift_backend.py was written with the assumption that we have the original TExecuteStatementResp from when a query began by the time we fetch results of that query.

But for a "picked up" execution, we don't have the original TExecuteStatementResp because we only use the query_id and query_secret to pick up the running execution. This is a problem because result fetching depends on configuration data present in TResultSetMetadata. If we had the original TExecuteStatementResp then thrift_backend.py knows how to use its properties to gather the configuration it needs to fetch results. But since we don't have the original TExecuteStatementResp in the case of a "picked up" execution, we need a way to trick thrift_backend.py into thinking otherwise.

For this situation, the Thrift server includes the TGetResultSetMetadataReq and TGetGetResultSetMetadataResp messages and thrift_backend.py helpfully includes a way to invisibly fetch this if the original TExecuteStatementResp.directResults property is false-y.

To hook into this, I created the FakeExecuteStatementResp which only possesses the properties necessary to make thrift_backend.py use its normal code-path. It sets .directResults=False and sets operationHandle to the THandleIdentifier for the current AsyncExecution.

As discussed internally, we need to refactor thrift_backend.py to not depend on the assumption of synchronous execution within the same thread. But for now, the FakeExecuteStatmentResp makes AsyncExecution behave exactly the same way in the original thread as it does for a thread where the execution was "picked up".

What's next?

After this I need to add documentation for this feature and update the changelog.

this requires updating the AsyncExecution to allow the __init__ status to be None and poll for it during initialisation. Signed-off-by: Jesse Whitehouse <[email protected]>

Signed-off-by: Jesse Whitehouse <[email protected]>

…ution takes longer than 5 seconds and therefore the AsyncExecution returned by `execute_async` doesn't have a result yet. Signed-off-by: Jesse Whitehouse <[email protected]>

Signed-off-by: Jesse Whitehouse <[email protected]>

for users to access the .query_id and .query_secret directly and manually convert them to strings. Signed-off-by: Jesse Whitehouse <[email protected]>

Signed-off-by: Jesse Whitehouse <[email protected]>

susodapop · 2024-01-09T01:18:30Z

tests/e2e/test_execute_async.py


-    def test_staging_operation(self):


This diff looks wacky here. I'm removing the test_staging_operation for the moment.

tests/e2e/test_execute_async.py

benc-db · 2024-01-09T23:53:59Z

src/databricks/sql/ae.py

@@ -83,6 +84,9 @@ def __init__(
        self.query_secret = query_secret
        self.status = status

+        if self.status is None:


I don't love having potentially long running ops in the init. For one thing, it means one more thing that must be mocked when unit testing, regardless of whether the test needs to interact with status directly. When I have initialization that is non-trivial, but must be complete for an object to be functional, I tend to put that in a factory method, and try to make the constructor hidden...not sure if we have that capability in python though. Take this comment with a grain of salt, because depending on the user experience, this point may be outweighed by trying to make the client experience simpler.

How about this:

I'll add a new AsyncExecutionStatus.UNKNOWN and make that the default. Then modify the .status class member to become a property that accesses a private ._status member. If ._status==AsyncExecutionStatus.UNKNOWN, this will fire-off the poll_for_status and set it.

This way, the long-running op won't happen until a user actually tries to do something which depends on the status.

is there a reason to have it pretend to be a property? As opposed to just get_status()? This might be a result of other languages I've programmed in, but in my mind, a property is ideally either a.) a field, or b.) something directly computable from fields. Making it a method suggests that work will be done to get the value.

I have no objection to that. Either way is technically "Pythonic".

benc-db · 2024-01-09T23:57:39Z

src/databricks/sql/ae.py

    def poll_for_status(self) -> None:
        """Check the thrift server for the status of this operation and set self.status

        This will result in an error if the operation has been canceled or aborted at the server"""

-        _output = self._thrift_backend._poll_for_status(self.t_operation_handle)
+        try:


Perfect example of why I don't love complexity in my constructors. An ideal constructor says, if you give me all the required parameters as input, you will get an instance of this object as object; here, however, our constructor could fail and throw an exception to the user. If you instead use a factory pattern, you have the choice of propagating the error, or just giving the user None, as you could name the method something like 'get_if_exists(...)'.

That makes sense to me.

I do have a factory function. It's the Connection.get_async_execution() method. I can push this "does it exist" checking into that function like you describe.

benc-db

Approve with suggestions

susodapop · 2024-01-10T00:12:14Z

Thanks! Will apply these suggestions and merge tomorrow. Documentation to follow.

Signed-off-by: Jesse Whitehouse <[email protected]>

susodapop · 2024-01-11T17:35:09Z

Running our e2e and unit tests before merging.

susodapop · 2024-01-18T23:19:28Z

Tests all pass. Merging.

Jesse Whitehouse added 6 commits January 8, 2024 16:34

Add get_async_execution to Connection

a2960fd

this requires updating the AsyncExecution to allow the __init__ status to be None and poll for it during initialisation. Signed-off-by: Jesse Whitehouse <[email protected]>

black the code

dbd7fbd

Signed-off-by: Jesse Whitehouse <[email protected]>

Add tests and update fixtures to evaluate behaviour when a query exec…

e966b1c

…ution takes longer than 5 seconds and therefore the AsyncExecution returned by `execute_async` doesn't have a result yet. Signed-off-by: Jesse Whitehouse <[email protected]>

Wrap RESOURCE_DOES_NOT_EXIST exceptions for clarity.

2ac7903

Signed-off-by: Jesse Whitehouse <[email protected]>

Add a .serialize() method for convenience. The alternative to this is

26dce5a

for users to access the .query_id and .query_secret directly and manually convert them to strings. Signed-off-by: Jesse Whitehouse <[email protected]>

Rename tests to properly isolate what works and what doesn't right now

a736bff

Signed-off-by: Jesse Whitehouse <[email protected]>

susodapop requested review from arikfr, yunbodeng-db and andrefurlan-db as code owners January 9, 2024 01:15

susodapop requested a review from benc-db January 9, 2024 01:15

susodapop commented Jan 9, 2024

View reviewed changes

tests/e2e/test_execute_async.py Outdated Show resolved Hide resolved

susodapop commented Jan 9, 2024

View reviewed changes

tests/e2e/test_execute_async.py Outdated Show resolved Hide resolved

benc-db reviewed Jan 9, 2024

View reviewed changes

susodapop changed the title ~~[PECO-1263] Add get_async_execution method (for query cancel and query status)~~ [PECO-1263] Add get_async_execution method Jan 10, 2024

benc-db approved these changes Jan 10, 2024

View reviewed changes

Jesse Whitehouse added 2 commits January 10, 2024 14:09

Make it possible to fetch results from a "picked up" query execution

24c5549

Signed-off-by: Jesse Whitehouse <[email protected]>

Refactor get_async_execution factory method based on PR review

4f95bca

Signed-off-by: Jesse Whitehouse <[email protected]>

susodapop merged commit bf046ff into peco-1263-staging Jan 18, 2024

susodapop deleted the add_get_async_execution_method branch January 18, 2024 23:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[PECO-1263] Add get_async_execution method #314

[PECO-1263] Add get_async_execution method #314

susodapop commented Jan 9, 2024 •

edited

Loading

susodapop Jan 9, 2024

benc-db Jan 9, 2024

susodapop Jan 10, 2024

benc-db Jan 10, 2024 •

edited

Loading

susodapop Jan 10, 2024

benc-db Jan 9, 2024

susodapop Jan 10, 2024

benc-db left a comment

susodapop commented Jan 10, 2024

susodapop commented Jan 11, 2024 •

edited

Loading

susodapop commented Jan 18, 2024

[PECO-1263] Add get_async_execution method #314

[PECO-1263] Add get_async_execution method #314

Conversation

susodapop commented Jan 9, 2024 • edited Loading

Description

What's next?

susodapop Jan 9, 2024

Choose a reason for hiding this comment

benc-db Jan 9, 2024

Choose a reason for hiding this comment

susodapop Jan 10, 2024

Choose a reason for hiding this comment

benc-db Jan 10, 2024 • edited Loading

Choose a reason for hiding this comment

susodapop Jan 10, 2024

Choose a reason for hiding this comment

benc-db Jan 9, 2024

Choose a reason for hiding this comment

susodapop Jan 10, 2024

Choose a reason for hiding this comment

benc-db left a comment

Choose a reason for hiding this comment

susodapop commented Jan 10, 2024

susodapop commented Jan 11, 2024 • edited Loading

susodapop commented Jan 18, 2024

susodapop commented Jan 9, 2024 •

edited

Loading

benc-db Jan 10, 2024 •

edited

Loading

susodapop commented Jan 11, 2024 •

edited

Loading