-
Notifications
You must be signed in to change notification settings - Fork 102
[PECO-1263] Add get_async_execution method #314
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
this requires updating the AsyncExecution to allow the __init__ status to be None and poll for it during initialisation. Signed-off-by: Jesse Whitehouse <[email protected]>
Signed-off-by: Jesse Whitehouse <[email protected]>
…ution takes longer than 5 seconds and therefore the AsyncExecution returned by `execute_async` doesn't have a result yet. Signed-off-by: Jesse Whitehouse <[email protected]>
Signed-off-by: Jesse Whitehouse <[email protected]>
for users to access the .query_id and .query_secret directly and manually convert them to strings. Signed-off-by: Jesse Whitehouse <[email protected]>
Signed-off-by: Jesse Whitehouse <[email protected]>
|
||
def test_staging_operation(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This diff looks wacky here. I'm removing the test_staging_operation
for the moment.
src/databricks/sql/ae.py
Outdated
@@ -83,6 +84,9 @@ def __init__( | |||
self.query_secret = query_secret | |||
self.status = status | |||
|
|||
if self.status is None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't love having potentially long running ops in the init. For one thing, it means one more thing that must be mocked when unit testing, regardless of whether the test needs to interact with status directly. When I have initialization that is non-trivial, but must be complete for an object to be functional, I tend to put that in a factory method, and try to make the constructor hidden...not sure if we have that capability in python though. Take this comment with a grain of salt, because depending on the user experience, this point may be outweighed by trying to make the client experience simpler.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about this:
I'll add a new AsyncExecutionStatus.UNKNOWN
and make that the default. Then modify the .status
class member to become a property that accesses a private ._status
member. If ._status==AsyncExecutionStatus.UNKNOWN
, this will fire-off the poll_for_status
and set it.
This way, the long-running op won't happen until a user actually tries to do something which depends on the status.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there a reason to have it pretend to be a property? As opposed to just get_status()? This might be a result of other languages I've programmed in, but in my mind, a property is ideally either a.) a field, or b.) something directly computable from fields. Making it a method suggests that work will be done to get the value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have no objection to that. Either way is technically "Pythonic".
def poll_for_status(self) -> None: | ||
"""Check the thrift server for the status of this operation and set self.status | ||
|
||
This will result in an error if the operation has been canceled or aborted at the server""" | ||
|
||
_output = self._thrift_backend._poll_for_status(self.t_operation_handle) | ||
try: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perfect example of why I don't love complexity in my constructors. An ideal constructor says, if you give me all the required parameters as input, you will get an instance of this object as object; here, however, our constructor could fail and throw an exception to the user. If you instead use a factory pattern, you have the choice of propagating the error, or just giving the user None, as you could name the method something like 'get_if_exists(...)'.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense to me.
I do have a factory function. It's the Connection.get_async_execution()
method. I can push this "does it exist" checking into that function like you describe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approve with suggestions
Thanks! Will apply these suggestions and merge tomorrow. Documentation to follow. |
Signed-off-by: Jesse Whitehouse <[email protected]>
Signed-off-by: Jesse Whitehouse <[email protected]>
Running our e2e and unit tests before merging. |
Tests all pass. Merging. |
Description
This PR adds a way to "pick up" a running execution using only its
query_id
andquery_secret
. The interface looks like this:To achieve this I had to update
AsyncExecution
's__init__
method to acceptNone
as its initialstatus
value. Prior to this,AsyncExecution
was always created with thefrom_thrift_response
method which already knows the initial value.I also had to introduce a new
FakeExecuteStatementResponse
dataclass. This is a workaround for the waythrift_backend.py
is currently written.thrift_backend.py
was written with the assumption that we have the originalTExecuteStatementResp
from when a query began by the time we fetch results of that query.But for a "picked up" execution, we don't have the original
TExecuteStatementResp
because we only use thequery_id
andquery_secret
to pick up the running execution. This is a problem because result fetching depends on configuration data present inTResultSetMetadata
. If we had the originalTExecuteStatementResp
thenthrift_backend.py
knows how to use its properties to gather the configuration it needs to fetch results. But since we don't have the originalTExecuteStatementResp
in the case of a "picked up" execution, we need a way to trickthrift_backend.py
into thinking otherwise.For this situation, the Thrift server includes the
TGetResultSetMetadataReq
andTGetGetResultSetMetadataResp
messages andthrift_backend.py
helpfully includes a way to invisibly fetch this if the originalTExecuteStatementResp.directResults
property is false-y.To hook into this, I created the
FakeExecuteStatementResp
which only possesses the properties necessary to makethrift_backend.py
use its normal code-path. It sets.directResults=False
and setsoperationHandle
to theTHandleIdentifier
for the currentAsyncExecution
.As discussed internally, we need to refactor
thrift_backend.py
to not depend on the assumption of synchronous execution within the same thread. But for now, theFakeExecuteStatmentResp
makesAsyncExecution
behave exactly the same way in the original thread as it does for a thread where the execution was "picked up".What's next?
After this I need to add documentation for this feature and update the changelog.