Skip to content

[PECO-1263] Add get_async_execution method #314

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Jan 18, 2024

Conversation

susodapop
Copy link
Contributor

@susodapop susodapop commented Jan 9, 2024

Description

This PR adds a way to "pick up" a running execution using only its query_id and query_secret. The interface looks like this:

with self.connection() as conn:
    query_id = "01eeae6b-xxxx-1513-89a3-4668a032ed77"
    query_secret = "01eeae6b-xxxx-179d-8faf-0f39dc3788f8"
    ae = conn.get_async_execution(query_id, query_secret)

To achieve this I had to update AsyncExecution's __init__ method to accept None as its initial status value. Prior to this, AsyncExecution was always created with the from_thrift_response method which already knows the initial value.

I also had to introduce a new FakeExecuteStatementResponse dataclass. This is a workaround for the way thrift_backend.py is currently written. thrift_backend.py was written with the assumption that we have the original TExecuteStatementResp from when a query began by the time we fetch results of that query.

But for a "picked up" execution, we don't have the original TExecuteStatementResp because we only use the query_id and query_secret to pick up the running execution. This is a problem because result fetching depends on configuration data present in TResultSetMetadata. If we had the original TExecuteStatementResp then thrift_backend.py knows how to use its properties to gather the configuration it needs to fetch results. But since we don't have the original TExecuteStatementResp in the case of a "picked up" execution, we need a way to trick thrift_backend.py into thinking otherwise.

For this situation, the Thrift server includes the TGetResultSetMetadataReq and TGetGetResultSetMetadataResp messages and thrift_backend.py helpfully includes a way to invisibly fetch this if the original TExecuteStatementResp.directResults property is false-y.

To hook into this, I created the FakeExecuteStatementResp which only possesses the properties necessary to make thrift_backend.py use its normal code-path. It sets .directResults=False and sets operationHandle to the THandleIdentifier for the current AsyncExecution.

As discussed internally, we need to refactor thrift_backend.py to not depend on the assumption of synchronous execution within the same thread. But for now, the FakeExecuteStatmentResp makes AsyncExecution behave exactly the same way in the original thread as it does for a thread where the execution was "picked up".

What's next?

After this I need to add documentation for this feature and update the changelog.

Jesse Whitehouse added 6 commits January 8, 2024 16:34
this requires updating the AsyncExecution to allow the __init__ status
to be None and poll for it during initialisation.

Signed-off-by: Jesse Whitehouse <[email protected]>
Signed-off-by: Jesse Whitehouse <[email protected]>
…ution

takes longer than 5 seconds and therefore the AsyncExecution returned
by `execute_async` doesn't have a result yet.

Signed-off-by: Jesse Whitehouse <[email protected]>
for users to access the .query_id and .query_secret directly and manually
convert them to strings.

Signed-off-by: Jesse Whitehouse <[email protected]>

def test_staging_operation(self):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This diff looks wacky here. I'm removing the test_staging_operation for the moment.

@@ -83,6 +84,9 @@ def __init__(
self.query_secret = query_secret
self.status = status

if self.status is None:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't love having potentially long running ops in the init. For one thing, it means one more thing that must be mocked when unit testing, regardless of whether the test needs to interact with status directly. When I have initialization that is non-trivial, but must be complete for an object to be functional, I tend to put that in a factory method, and try to make the constructor hidden...not sure if we have that capability in python though. Take this comment with a grain of salt, because depending on the user experience, this point may be outweighed by trying to make the client experience simpler.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about this:

I'll add a new AsyncExecutionStatus.UNKNOWN and make that the default. Then modify the .status class member to become a property that accesses a private ._status member. If ._status==AsyncExecutionStatus.UNKNOWN, this will fire-off the poll_for_status and set it.

This way, the long-running op won't happen until a user actually tries to do something which depends on the status.

Copy link
Collaborator

@benc-db benc-db Jan 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a reason to have it pretend to be a property? As opposed to just get_status()? This might be a result of other languages I've programmed in, but in my mind, a property is ideally either a.) a field, or b.) something directly computable from fields. Making it a method suggests that work will be done to get the value.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no objection to that. Either way is technically "Pythonic".

def poll_for_status(self) -> None:
"""Check the thrift server for the status of this operation and set self.status

This will result in an error if the operation has been canceled or aborted at the server"""

_output = self._thrift_backend._poll_for_status(self.t_operation_handle)
try:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perfect example of why I don't love complexity in my constructors. An ideal constructor says, if you give me all the required parameters as input, you will get an instance of this object as object; here, however, our constructor could fail and throw an exception to the user. If you instead use a factory pattern, you have the choice of propagating the error, or just giving the user None, as you could name the method something like 'get_if_exists(...)'.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense to me.

I do have a factory function. It's the Connection.get_async_execution() method. I can push this "does it exist" checking into that function like you describe.

@susodapop susodapop changed the title [PECO-1263] Add get_async_execution method (for query cancel and query status) [PECO-1263] Add get_async_execution method Jan 10, 2024
Copy link
Collaborator

@benc-db benc-db left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approve with suggestions

@susodapop
Copy link
Contributor Author

Thanks! Will apply these suggestions and merge tomorrow. Documentation to follow.

@susodapop
Copy link
Contributor Author

susodapop commented Jan 11, 2024

Running our e2e and unit tests before merging.

@susodapop
Copy link
Contributor Author

Tests all pass. Merging.

@susodapop susodapop merged commit bf046ff into peco-1263-staging Jan 18, 2024
@susodapop susodapop deleted the add_get_async_execution_method branch January 18, 2024 23:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants