Skip to content

Introduce SeaDatabricksClient (Session Implementation) #577

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 325 commits into from

Conversation

varun-edachali-dbx
Copy link
Collaborator

@varun-edachali-dbx varun-edachali-dbx commented May 29, 2025

What type of PR is this?

  • Feature

Description

  • Introduce SEA Backend client as an alternative to the existing Thrift backend
  • implement open_session() and close_session() functionality, and add an experimental, temporary test script to test them.

How is this tested?

  • Unit tests
  • E2E Tests
  • Manually
  • N/A

Related Tickets & Documents

https://docs.google.com/document/d/1Y-eXLhNqqhrMVGnOlG8sdFrCxBTN1GdQvuKG4IfHmo0/edit?usp=sharing
https://databricks.atlassian.net/browse/PECOBLR-482?atlOrigin=eyJpIjoiODdkMGQ5NjZhMjY5NDRmZWEwZmIyNGEyNzFlNGQxNzEiLCJwIjoiaiJ9

jayantsing-db and others added 30 commits May 28, 2025 03:40
…544)

This PR replaces the previous implementation of convert_decimals_in_arrow_table() with a more efficient approach that uses PyArrow's native casting operation instead of going through pandas conversion and array creation.

- Remove conversion to pandas DataFrame via to_pandas() and apply() methods
- Remove intermediate steps of creating array from decimal column and setting it back
- Replace with direct type casting using PyArrow's cast() method
- Build a new table with transformed columns rather than modifying the original table
- Create a new schema based on the modified fields

The new approach is more performant by avoiding pandas conversion overhead. The table below highlights substantial performance improvements when retrieving all rows from a table containing decimal columns, particularly when compression is disabled. Even greater gains were observed with compression enabled—showing approximately an 84% improvement (6 seconds compared to 39 seconds). Benchmarking was performed against e2-dogfood, with the client located in the us-west-2 region.
![image](https://github.com/user-attachments/assets/5407b651-8ab6-4c13-b525-cf912f503ba0)

Signed-off-by: Jayant Singh <[email protected]>
Signed-off-by: varun-edachali-dbx <[email protected]>
ensure maintenance of current APIs of Connection while delegating
responsibility

Signed-off-by: varun-edachali-dbx <[email protected]>
… through Connection

Signed-off-by: varun-edachali-dbx <[email protected]>
as in CONTRIBUTING.md

Signed-off-by: varun-edachali-dbx <[email protected]>
Signed-off-by: varun-edachali-dbx <[email protected]>
in case the openSession takes long, the initialisation of the session
will not complete immediately. This could make the session attribute
inaccessible. If the Connection is deleted in this time, the open()
check will throw because the session attribute does not exist. Thus, we
default to the Connection being closed in this case. This was not an
issue before because open was a direct attribute of the Connection
class. Caught in the integration tests.

Signed-off-by: varun-edachali-dbx <[email protected]>
WARNING: incomplete - result set still aligned heavily with thrift and
the necessity of some defined functions is to be validated

Signed-off-by: varun-edachali-dbx <[email protected]>
Signed-off-by: varun-edachali-dbx <[email protected]>
…emany test

without this the call_args are not logged on the instance and the unit
test fails

Signed-off-by: varun-edachali-dbx <[email protected]>
Signed-off-by: varun-edachali-dbx <[email protected]>
Signed-off-by: varun-edachali-dbx <[email protected]>
Signed-off-by: varun-edachali-dbx <[email protected]>
Signed-off-by: varun-edachali-dbx <[email protected]>
new codeowners

Signed-off-by: varun-edachali-dbx <[email protected]>
…t to prevent server side resource leaks (#554)

* Enhance Cursor close handling and context manager exception management

* tests

* fmt

* Fix Cursor.close() to properly handle CursorAlreadyClosedError

* Remove specific test message from Cursor.close() error handling

* Improve error handling in connection and cursor context managers to ensure proper closure during exceptions, including KeyboardInterrupt. Add tests for nested cursor management and verify operation closure on server-side errors.

* add

* add

Signed-off-by: varun-edachali-dbx <[email protected]>
* PECOBLR-86 Improve logging for debug level

Signed-off-by: Sai Shree Pradhan <[email protected]>

* PECOBLR-86 Improve logging for debug level

Signed-off-by: Sai Shree Pradhan <[email protected]>

* fixed format

Signed-off-by: Sai Shree Pradhan <[email protected]>

* used lazy logging

Signed-off-by: Sai Shree Pradhan <[email protected]>

* changed debug to error logs

Signed-off-by: Sai Shree Pradhan <[email protected]>

* used lazy logging

Signed-off-by: Sai Shree Pradhan <[email protected]>

---------

Signed-off-by: Sai Shree Pradhan <[email protected]>
Signed-off-by: varun-edachali-dbx <[email protected]>
More conditions to run github actions

Signed-off-by: varun-edachali-dbx <[email protected]>
Signed-off-by: varun-edachali-dbx <[email protected]>
Signed-off-by: varun-edachali-dbx <[email protected]>
Signed-off-by: varun-edachali-dbx <[email protected]>
Signed-off-by: varun-edachali-dbx <[email protected]>
Signed-off-by: varun-edachali-dbx <[email protected]>
Signed-off-by: varun-edachali-dbx <[email protected]>
Signed-off-by: varun-edachali-dbx <[email protected]>
Signed-off-by: varun-edachali-dbx <[email protected]>
Signed-off-by: varun-edachali-dbx <[email protected]>
Copy link

Thanks for your contribution! To satisfy the DCO policy in our contributing guide every commit message must include a sign-off message. One or more of your commits is missing this message. You can reword previous commit messages with an interactive rebase (git rebase -i main).

@varun-edachali-dbx varun-edachali-dbx changed the title Sess sea Introduce SEADatabricksClient May 29, 2025
Signed-off-by: varun-edachali-dbx <[email protected]>
Copy link

Thanks for your contribution! To satisfy the DCO policy in our contributing guide every commit message must include a sign-off message. One or more of your commits is missing this message. You can reword previous commit messages with an interactive rebase (git rebase -i main).

@varun-edachali-dbx varun-edachali-dbx changed the title Introduce SEADatabricksClient Introduce SeaDatabricksClient (Session Implementation) May 29, 2025
Copy link

Thanks for your contribution! To satisfy the DCO policy in our contributing guide every commit message must include a sign-off message. One or more of your commits is missing this message. You can reword previous commit messages with an interactive rebase (git rebase -i main).

Signed-off-by: varun-edachali-dbx <[email protected]>
Copy link

Thanks for your contribution! To satisfy the DCO policy in our contributing guide every commit message must include a sign-off message. One or more of your commits is missing this message. You can reword previous commit messages with an interactive rebase (git rebase -i main).

@varun-edachali-dbx
Copy link
Collaborator Author

superseded by #582 (cleaner commit history)

@varun-edachali-dbx varun-edachali-dbx deleted the sess-sea branch June 3, 2025 09:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.