databricks · susodapop · Feb 17, 2023 · Feb 17, 2023 · Feb 17, 2023 · Feb 17, 2023
@@ -3,7 +3,7 @@
 [![PyPI](https://img.shields.io/pypi/v/databricks-sql-connector?style=flat-square)](https://pypi.org/project/databricks-sql-connector/)
 [![Downloads](https://pepy.tech/badge/databricks-sql-connector)](https://pepy.tech/project/databricks-sql-connector)
 
-The Databricks SQL Connector for Python allows you to develop Python applications that connect to Databricks clusters and SQL warehouses. It is a Thrift-based client with no dependencies on ODBC or JDBC. It conforms to the [Python DB API 2.0 specification](https://www.python.org/dev/peps/pep-0249/).
+The Databricks SQL Connector for Python allows you to develop Python applications that connect to Databricks clusters and SQL warehouses. It is a Thrift-based client with no dependencies on ODBC or JDBC. It conforms to the [Python DB API 2.0 specification](https://www.python.org/dev/peps/pep-0249/) and exposes a [SQLAlchemy](https://www.sqlalchemy.org/) dialect for use with tools like `pandas` and `alembic` which use SQLAlchemy to execute DDL.
 
 This connector uses Arrow as the data-exchange format, and supports APIs to directly fetch Arrow tables. Arrow tables are wrapped in the `ArrowQueue` class to provide a natural API to get several rows at a time.
 
@@ -24,16 +24,27 @@ For the latest documentation, see
 
 Install the library with `pip install databricks-sql-connector`
 
-Example usage:
+Note: Don't hard-code authentication secrets into your Python. Use environment variables
+
+```bash
+export DATABRICKS_HOST=********.databricks.com
+export DATABRICKS_HTTP_PATH=/sql/1.0/endpoints/****************
+export DATABRICKS_TOKEN=dapi********************************
+```
 
+Example usage:
 ```python
+import os
 from databricks import sql
 
-connection = sql.connect(
-  server_hostname='********.databricks.com',
-  http_path='/sql/1.0/endpoints/****************',
-  access_token='dapi********************************')
+host = os.getenv("DATABRICKS_HOST)
+http_path = os.getenv("DATABRICKS_HTTP_PATH)
+access_token = os.getenv("DATABRICKS_ACCESS_TOKEN)
 
+connection = sql.connect(
+  server_hostname=host,
+  http_path=http_path,
+  access_token=access_token)
 
 cursor = connection.cursor()
 

@@ -36,4 +36,5 @@ To run all of these examples you can clone the entire repository to your disk. O
 - **`persistent_oauth.py`** shows a more advanced example of authenticating by OAuth while Bring Your Own IDP is in public preview. In this case, it shows how to use a sublcass of `OAuthPersistence` to reuse an OAuth token across script executions.
 - **`set_user_agent.py`** shows how to customize the user agent header used for Thrift commands. In
 this example the string `ExamplePartnerTag` will be added to the the user agent on every request.
-- **`staging_ingestion.py`** shows how the connector handles Databricks' experimental staging ingestion commands `GET`, `PUT`, and `REMOVE`.
+- **`staging_ingestion.py`** shows how the connector handles Databricks' experimental staging ingestion commands `GET`, `PUT`, and `REMOVE`.
+- **`sqlalchemy.py`** shows a basic example of connecting to Databricks with [SQLAlchemy](https://www.sqlalchemy.org/). 
@@ -0,0 +1,92 @@
+"""
+databricks-sql-connector includes a SQLAlchemy dialect compatible with Databricks SQL.
+It aims to be a drop-in replacement for the crflynn/sqlalchemy-databricks project, that implements
+more of the Databricks API, particularly around table reflection, Alembic usage, and data
+ingestion with pandas.
+
+Because of the extent of SQLAlchemy's capabilities it isn't feasible to provide examples of every
+usage in a single script, so we only provide a basic one here. More examples are found in our test
+suite at tests/e2e/sqlalchemy/test_basic.py and in the PR that implements this change: 
+
+https://github.com/databricks/databricks-sql-python/pull/57
+
+# What's already supported
+
+Most of the functionality is demonstrated in the e2e tests mentioned above. The below list we
+derived from those test method names:
+
+    - Create and drop tables with SQLAlchemy Core
+    - Create and drop tables with SQLAlchemy ORM
+    - Read created tables via reflection
+    - Modify column nullability
+    - Insert records manually
+    - Insert records with pandas.to_sql (note that this does not work for DataFrames with indexes)
+
+This connector also aims to support Alembic for programmatic delta table schema maintenance. This
+behaviour is not yet backed by integration tests, which will follow in a subsequent PR as we learn
+more about customer use cases there. That said, the following behaviours have been tested manually:
+
+    - Autogenerate revisions with alembic revision --autogenerate
+    - Upgrade and downgrade between revisions with `alembic upgrade <revision hash>` and
+      `alembic downgrade <revision hash>`
+
+# Known Gaps
+    - MAP, ARRAY, and STRUCT types: this dialect can read these types out as strings. But you cannot
+      define a SQLAlchemy model with databricks.sqlalchemy.dialect.types.DatabricksMap (e.g.) because
+      we haven't implemented them yet.
+    - Constraints: with the addition of information_schema to Unity Catalog, Databricks SQL supports
+      foreign key and primary key constraints. This dialect can write these constraints but the ability
+      for alembic to reflect and modify them programmatically has not been tested.
+"""
+
+import os
+from sqlalchemy.orm import declarative_base, Session
+from sqlalchemy import Column, String, Integer, BOOLEAN, create_engine, select
+
+host = os.getenv("DATABRICKS_SERVER_HOSTNAME")
+http_path = os.getenv("DATABRICKS_HTTP_PATH")
+access_token = os.getenv("DATABRICKS_TOKEN")
+catalog = os.getenv("DATABRICKS_CATALOG")
+schema = os.getenv("DATABRICKS_SCHEMA")
+
+
+# Extra arguments are passed untouched to the driver
+# See thrift_backend.py for complete list
+extra_connect_args = {
+    "_tls_verify_hostname": True,
+    "_user_agent_entry": "PySQL Example Script",
+}
+
+engine = create_engine(
+    f"databricks://token:{access_token}@{host}?http_path={http_path}&catalog={catalog}&schema={schema}",
+    connect_args=extra_connect_args,
+)
+session = Session(bind=engine)
+base = declarative_base(bind=engine)
+
+
+class SampleObject(base):
+
+    __tablename__ = "mySampleTable"
+
+    name = Column(String(255), primary_key=True)
+    episodes = Column(Integer)
+    some_bool = Column(BOOLEAN)
+
+
+base.metadata.create_all()
+
+sample_object_1 = SampleObject(name="Bim Adewunmi", episodes=6, some_bool=True)
+sample_object_2 = SampleObject(name="Miki Meek", episodes=12, some_bool=False)
+
+session.add(sample_object_1)
+session.add(sample_object_2)
+
+session.commit()
+
+stmt = select(SampleObject).where(SampleObject.name.in_(["Bim Adewunmi", "Miki Meek"]))
+
+output = [i for i in session.scalars(stmt)]
+assert len(output) == 2
+
+base.metadata.drop_all()