Skip to content

Commit 01e998c

Browse files
PySQL Connector split into connector and sqlalchemy (#444)
* Modified the gitignore file to not have .idea file * [PECO-1803] Splitting the PySql connector into the core and the non core part (#417) * Implemented ColumnQueue to test the fetchall without pyarrow Removed token removed token * order of fields in row corrected * Changed the folder structure and tested the basic setup to work * Refractored the code to make connector to work * Basic Setup of connector, core and sqlalchemy is working * Basic integration of core, connect and sqlalchemy is working * Setup working dynamic change from ColumnQueue to ArrowQueue * Refractored the test code and moved to respective folders * Added the unit test for column_queue Fixed __version__ Fix * venv_main added to git ignore * Added code for merging columnar table * Merging code for columnar * Fixed the retry_close sesssion test issue with logging * Fixed the databricks_sqlalchemy tests and introduced pytest.ini for the sqla_testing * Added pyarrow_test mark on pytest * Fixed databricks.sqlalchemy to databricks_sqlalchemy imports * Added poetry.lock * Added dist folder * Changed the pyproject.toml * Minor Fix * Added the pyarrow skip tag on unit tests and tested their working * Fixed the Decimal and timestamp conversion issue in non arrow pipeline * Removed not required files and reformatted * Fixed test_retry error * Changed the folder structure to src / databricks * Removed the columnar non arrow flow to another PR * Moved the README to the root * removed columnQueue instance * Revmoved databricks_sqlalchemy dependency in core * Changed the pysql_supports_arrow predicate, introduced changes in the pyproject.toml * Ran the black formatter with the original version * Extra .py removed from all the __init__.py files names * Undo formatting check * Check * Check * Check * Check * Check * Check * Check * Check * Check * Check * Check * Check * Check * Check * BIG UPDATE * Refeactor code * Refractor * Fixed versioning * Minor refractoring * Minor refractoring * Changed the folder structure such that sqlalchemy has not reference here * Fixed README.md and CONTRIBUTING.md * Added manual publish * On push trigger added * Manually setting the publish step * Changed versioning in pyproject.toml * Bumped up the version to 4.0.0.b3 and also changed the structure to have pyarrow as optional * Removed the sqlalchemy tests from integration.yml file * [PECO-1803] Print warning message if pyarrow is not installed (#468) Print warning message if pyarrow is not installed Signed-off-by: Jacky Hu <[email protected]> * [PECO-1803] Remove sqlalchemy and update README.md (#469) Remove sqlalchemy and update README.md Signed-off-by: Jacky Hu <[email protected]> * Removed all sqlalchemy related stuff * generated the lock file * Fixed failing tests * removed poetry.lock * Updated the lock file * Fixed poetry numpy 2.2.2 issue * Workflow fixes --------- Signed-off-by: Jacky Hu <[email protected]> Co-authored-by: Jacky Hu <[email protected]>
1 parent f9d6ef1 commit 01e998c

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

41 files changed

+467
-4911
lines changed

.github/workflows/code-quality-checks.yml

+51
Original file line numberDiff line numberDiff line change
@@ -58,6 +58,57 @@ jobs:
5858
#----------------------------------------------
5959
- name: Run tests
6060
run: poetry run python -m pytest tests/unit
61+
run-unit-tests-with-arrow:
62+
runs-on: ubuntu-latest
63+
strategy:
64+
matrix:
65+
python-version: [ 3.8, 3.9, "3.10", "3.11" ]
66+
steps:
67+
#----------------------------------------------
68+
# check-out repo and set-up python
69+
#----------------------------------------------
70+
- name: Check out repository
71+
uses: actions/checkout@v2
72+
- name: Set up python ${{ matrix.python-version }}
73+
id: setup-python
74+
uses: actions/setup-python@v2
75+
with:
76+
python-version: ${{ matrix.python-version }}
77+
#----------------------------------------------
78+
# ----- install & configure poetry -----
79+
#----------------------------------------------
80+
- name: Install Poetry
81+
uses: snok/install-poetry@v1
82+
with:
83+
virtualenvs-create: true
84+
virtualenvs-in-project: true
85+
installer-parallel: true
86+
87+
#----------------------------------------------
88+
# load cached venv if cache exists
89+
#----------------------------------------------
90+
- name: Load cached venv
91+
id: cached-poetry-dependencies
92+
uses: actions/cache@v2
93+
with:
94+
path: .venv-pyarrow
95+
key: venv-pyarrow-${{ runner.os }}-${{ steps.setup-python.outputs.python-version }}-${{ github.event.repository.name }}-${{ hashFiles('**/poetry.lock') }}
96+
#----------------------------------------------
97+
# install dependencies if cache does not exist
98+
#----------------------------------------------
99+
- name: Install dependencies
100+
if: steps.cached-poetry-dependencies.outputs.cache-hit != 'true'
101+
run: poetry install --no-interaction --no-root
102+
#----------------------------------------------
103+
# install your root project, if required
104+
#----------------------------------------------
105+
- name: Install library
106+
run: poetry install --no-interaction --all-extras
107+
#----------------------------------------------
108+
# run test suite
109+
#----------------------------------------------
110+
- name: Run tests
111+
run: poetry run python -m pytest tests/unit
61112
check-linting:
62113
runs-on: ubuntu-latest
63114
strategy:

.github/workflows/integration.yml

-2
Original file line numberDiff line numberDiff line change
@@ -55,5 +55,3 @@ jobs:
5555
#----------------------------------------------
5656
- name: Run e2e tests
5757
run: poetry run python -m pytest tests/e2e
58-
- name: Run SQL Alchemy tests
59-
run: poetry run python -m pytest src/databricks/sqlalchemy/test_local

.github/workflows/publish-manual.yml

+78
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,78 @@
1+
name: Publish to PyPI Manual [Production]
2+
3+
# Allow manual triggering of the workflow
4+
on:
5+
workflow_dispatch: {}
6+
7+
jobs:
8+
publish:
9+
name: Publish
10+
runs-on: ubuntu-latest
11+
12+
steps:
13+
#----------------------------------------------
14+
# Step 1: Check out the repository code
15+
#----------------------------------------------
16+
- name: Check out repository
17+
uses: actions/checkout@v2 # Check out the repository to access the code
18+
19+
#----------------------------------------------
20+
# Step 2: Set up Python environment
21+
#----------------------------------------------
22+
- name: Set up python
23+
id: setup-python
24+
uses: actions/setup-python@v2
25+
with:
26+
python-version: 3.9 # Specify the Python version to be used
27+
28+
#----------------------------------------------
29+
# Step 3: Install and configure Poetry
30+
#----------------------------------------------
31+
- name: Install Poetry
32+
uses: snok/install-poetry@v1 # Install Poetry, the Python package manager
33+
with:
34+
virtualenvs-create: true
35+
virtualenvs-in-project: true
36+
installer-parallel: true
37+
38+
# #----------------------------------------------
39+
# # Step 4: Load cached virtual environment (if available)
40+
# #----------------------------------------------
41+
# - name: Load cached venv
42+
# id: cached-poetry-dependencies
43+
# uses: actions/cache@v2
44+
# with:
45+
# path: .venv # Path to the virtual environment
46+
# key: venv-${{ runner.os }}-${{ steps.setup-python.outputs.python-version }}-${{ github.event.repository.name }}-${{ hashFiles('**/poetry.lock') }}
47+
# # Cache key is generated based on OS, Python version, repo name, and the `poetry.lock` file hash
48+
49+
# #----------------------------------------------
50+
# # Step 5: Install dependencies if the cache is not found
51+
# #----------------------------------------------
52+
# - name: Install dependencies
53+
# if: steps.cached-poetry-dependencies.outputs.cache-hit != 'true' # Only run if the cache was not hit
54+
# run: poetry install --no-interaction --no-root # Install dependencies without interaction
55+
56+
# #----------------------------------------------
57+
# # Step 6: Update the version to the manually provided version
58+
# #----------------------------------------------
59+
# - name: Update pyproject.toml with the specified version
60+
# run: poetry version ${{ github.event.inputs.version }} # Use the version provided by the user input
61+
62+
#----------------------------------------------
63+
# Step 7: Build and publish the first package to PyPI
64+
#----------------------------------------------
65+
- name: Build and publish databricks sql connector to PyPI
66+
working-directory: ./databricks_sql_connector
67+
run: |
68+
poetry build
69+
poetry publish -u __token__ -p ${{ secrets.PROD_PYPI_TOKEN }} # Publish with PyPI token
70+
#----------------------------------------------
71+
# Step 7: Build and publish the second package to PyPI
72+
#----------------------------------------------
73+
74+
- name: Build and publish databricks sql connector core to PyPI
75+
working-directory: ./databricks_sql_connector_core
76+
run: |
77+
poetry build
78+
poetry publish -u __token__ -p ${{ secrets.PROD_PYPI_TOKEN }} # Publish with PyPI token

.gitignore

+1-1
Original file line numberDiff line numberDiff line change
@@ -195,7 +195,7 @@ cython_debug/
195195
# be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore
196196
# and can be added to the global gitignore or merged into this file. For a more nuclear
197197
# option (not recommended) you can uncomment the following to ignore the entire idea folder.
198-
#.idea/
198+
.idea/
199199

200200
# End of https://www.toptal.com/developers/gitignore/api/python,macos
201201

CHANGELOG.md

+5
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,10 @@
11
# Release History
22

3+
# 4.0.0 (TBD)
4+
5+
- Split the connector into two separate packages: `databricks-sql-connector` and `databricks-sqlalchemy`. The `databricks-sql-connector` package contains the core functionality of the connector, while the `databricks-sqlalchemy` package contains the SQLAlchemy dialect for the connector.
6+
- Pyarrow dependency is now optional in `databricks-sql-connector`. Users needing arrow are supposed to explicitly install pyarrow
7+
38
# 3.7.0 (2024-12-23)
49

510
- Fix: Incorrect number of rows fetched in inline results when fetching results with FETCH_NEXT orientation (databricks/databricks-sql-python#479 by @jprakash-db)

CONTRIBUTING.md

-3
Original file line numberDiff line numberDiff line change
@@ -144,9 +144,6 @@ The `PySQLStagingIngestionTestSuite` namespace requires a cluster running DBR ve
144144

145145
The suites marked `[not documented]` require additional configuration which will be documented at a later time.
146146

147-
#### SQLAlchemy dialect tests
148-
149-
See README.tests.md for details.
150147

151148
### Code formatting
152149

README.md

+20-3
Original file line numberDiff line numberDiff line change
@@ -3,9 +3,9 @@
33
[![PyPI](https://img.shields.io/pypi/v/databricks-sql-connector?style=flat-square)](https://pypi.org/project/databricks-sql-connector/)
44
[![Downloads](https://pepy.tech/badge/databricks-sql-connector)](https://pepy.tech/project/databricks-sql-connector)
55

6-
The Databricks SQL Connector for Python allows you to develop Python applications that connect to Databricks clusters and SQL warehouses. It is a Thrift-based client with no dependencies on ODBC or JDBC. It conforms to the [Python DB API 2.0 specification](https://www.python.org/dev/peps/pep-0249/) and exposes a [SQLAlchemy](https://www.sqlalchemy.org/) dialect for use with tools like `pandas` and `alembic` which use SQLAlchemy to execute DDL. Use `pip install databricks-sql-connector[sqlalchemy]` to install with SQLAlchemy's dependencies. `pip install databricks-sql-connector[alembic]` will install alembic's dependencies.
6+
The Databricks SQL Connector for Python allows you to develop Python applications that connect to Databricks clusters and SQL warehouses. It is a Thrift-based client with no dependencies on ODBC or JDBC. It conforms to the [Python DB API 2.0 specification](https://www.python.org/dev/peps/pep-0249/).
77

8-
This connector uses Arrow as the data-exchange format, and supports APIs to directly fetch Arrow tables. Arrow tables are wrapped in the `ArrowQueue` class to provide a natural API to get several rows at a time.
8+
This connector uses Arrow as the data-exchange format, and supports APIs (e.g. `fetchmany_arrow`) to directly fetch Arrow tables. Arrow tables are wrapped in the `ArrowQueue` class to provide a natural API to get several rows at a time. [PyArrow](https://arrow.apache.org/docs/python/index.html) is required to enable this and use these APIs, you can install it via `pip install pyarrow` or `pip install databricks-sql-connector[pyarrow]`.
99

1010
You are welcome to file an issue here for general use cases. You can also contact Databricks Support [here](help.databricks.com).
1111

@@ -22,7 +22,12 @@ For the latest documentation, see
2222

2323
## Quickstart
2424

25-
Install the library with `pip install databricks-sql-connector`
25+
### Installing the core library
26+
Install using `pip install databricks-sql-connector`
27+
28+
### Installing the core library with PyArrow
29+
Install using `pip install databricks-sql-connector[pyarrow]`
30+
2631

2732
```bash
2833
export DATABRICKS_HOST=********.databricks.com
@@ -60,6 +65,18 @@ or to a Databricks Runtime interactive cluster (e.g. /sql/protocolv1/o/123456789
6065
> to authenticate the target Databricks user account and needs to open the browser for authentication. So it
6166
> can only run on the user's machine.
6267
68+
## SQLAlchemy
69+
Starting from `databricks-sql-connector` version 4.0.0 SQLAlchemy support has been extracted to a new library `databricks-sqlalchemy`.
70+
71+
- Github repository [databricks-sqlalchemy github](https://github.com/databricks/databricks-sqlalchemy)
72+
- PyPI [databricks-sqlalchemy pypi](https://pypi.org/project/databricks-sqlalchemy/)
73+
74+
### Quick SQLAlchemy guide
75+
Users can now choose between using the SQLAlchemy v1 or SQLAlchemy v2 dialects with the connector core
76+
77+
- Install the latest SQLAlchemy v1 using `pip install databricks-sqlalchemy~=1.0`
78+
- Install SQLAlchemy v2 using `pip install databricks-sqlalchemy`
79+
6380

6481
## Contributing
6582

examples/sqlalchemy.py

-174
This file was deleted.

0 commit comments

Comments
 (0)