Skip to content

Commit 18f7daf

Browse files
authored
Add SQL Support for ADBC Drivers (pandas-dev#53869)
* close to complete implementation * working implementation for postgres * sqlite implementation * Added ADBC to CI * Doc updates * Whatsnew update * Better optional dependency import * min versions fix * import updates * docstring fix * doc fixup * Updates for 0.6.0 * fix sqlite name escaping * more cleanups * more 0.6.0 updates * typo * remove warning * test_sql expectations * revert whatsnew issues * pip deps * Suppress pyarrow warning * Updated docs * mypy fixes * Remove stacklevel check from test * typo fix * compat * Joris feedback * Better test coverage with ADBC * cleanups * feedback * checkpoint * more checkpoint * more skips * updates * implement more * bump to 0.7.0 * fixups * cleanups * sqlite fixups * pyarrow compat * revert to using pip instead of conda * documentation cleanups * compat fixups * Fix stacklevel * remove unneeded code * commit after drop in fixtures * close cursor * fix table dropping * Bumped ADBC min to 0.8.0 * documentation * doc updates * more fixups * documentation fixups * fixes * more documentation * doc spacing * doc target fix * pyarrow warning compat * feedback * updated io documentation * install updates
1 parent 3f6b08b commit 18f7daf

15 files changed

+1152
-85
lines changed

ci/deps/actions-310.yaml

+2
Original file line numberDiff line numberDiff line change
@@ -56,5 +56,7 @@ dependencies:
5656
- zstandard>=0.19.0
5757

5858
- pip:
59+
- adbc-driver-postgresql>=0.8.0
60+
- adbc-driver-sqlite>=0.8.0
5961
- pyqt5>=5.15.8
6062
- tzdata>=2022.7

ci/deps/actions-311-downstream_compat.yaml

+2
Original file line numberDiff line numberDiff line change
@@ -70,6 +70,8 @@ dependencies:
7070
- pyyaml
7171
- py
7272
- pip:
73+
- adbc-driver-postgresql>=0.8.0
74+
- adbc-driver-sqlite>=0.8.0
7375
- dataframe-api-compat>=0.1.7
7476
- pyqt5>=5.15.8
7577
- tzdata>=2022.7

ci/deps/actions-311.yaml

+2
Original file line numberDiff line numberDiff line change
@@ -56,5 +56,7 @@ dependencies:
5656
- zstandard>=0.19.0
5757

5858
- pip:
59+
- adbc-driver-postgresql>=0.8.0
60+
- adbc-driver-sqlite>=0.8.0
5961
- pyqt5>=5.15.8
6062
- tzdata>=2022.7

ci/deps/actions-39-minimum_versions.yaml

+2
Original file line numberDiff line numberDiff line change
@@ -58,6 +58,8 @@ dependencies:
5858
- zstandard=0.19.0
5959

6060
- pip:
61+
- adbc-driver-postgresql==0.8.0
62+
- adbc-driver-sqlite==0.8.0
6163
- dataframe-api-compat==0.1.7
6264
- pyqt5==5.15.8
6365
- tzdata==2022.7

ci/deps/actions-39.yaml

+2
Original file line numberDiff line numberDiff line change
@@ -56,5 +56,7 @@ dependencies:
5656
- zstandard>=0.19.0
5757

5858
- pip:
59+
- adbc-driver-postgresql>=0.8.0
60+
- adbc-driver-sqlite>=0.8.0
5961
- pyqt5>=5.15.8
6062
- tzdata>=2022.7

ci/deps/circle-310-arm64.yaml

+3
Original file line numberDiff line numberDiff line change
@@ -54,3 +54,6 @@ dependencies:
5454
- xlrd>=2.0.1
5555
- xlsxwriter>=3.0.5
5656
- zstandard>=0.19.0
57+
- pip:
58+
- adbc-driver-postgresql>=0.8.0
59+
- adbc-driver-sqlite>=0.8.0

doc/source/getting_started/install.rst

+3-1
Original file line numberDiff line numberDiff line change
@@ -335,7 +335,7 @@ lxml 4.9.2 xml XML parser for read
335335
SQL databases
336336
^^^^^^^^^^^^^
337337

338-
Installable with ``pip install "pandas[postgresql, mysql, sql-other]"``.
338+
Traditional drivers are installable with ``pip install "pandas[postgresql, mysql, sql-other]"``
339339

340340
========================= ================== =============== =============================================================
341341
Dependency Minimum Version pip extra Notes
@@ -345,6 +345,8 @@ SQLAlchemy 2.0.0 postgresql, SQL support for dat
345345
sql-other
346346
psycopg2 2.9.6 postgresql PostgreSQL engine for sqlalchemy
347347
pymysql 1.0.2 mysql MySQL engine for sqlalchemy
348+
adbc-driver-postgresql 0.8.0 postgresql ADBC Driver for PostgreSQL
349+
adbc-driver-sqlite 0.8.0 sql-other ADBC Driver for SQLite
348350
========================= ================== =============== =============================================================
349351

350352
Other data sources

doc/source/user_guide/io.rst

+104-10
Original file line numberDiff line numberDiff line change
@@ -5565,9 +5565,23 @@ SQL queries
55655565
-----------
55665566

55675567
The :mod:`pandas.io.sql` module provides a collection of query wrappers to both
5568-
facilitate data retrieval and to reduce dependency on DB-specific API. Database abstraction
5569-
is provided by SQLAlchemy if installed. In addition you will need a driver library for
5570-
your database. Examples of such drivers are `psycopg2 <https://www.psycopg.org/>`__
5568+
facilitate data retrieval and to reduce dependency on DB-specific API.
5569+
5570+
Where available, users may first want to opt for `Apache Arrow ADBC
5571+
<https://arrow.apache.org/adbc/current/index.html>`_ drivers. These drivers
5572+
should provide the best performance, null handling, and type detection.
5573+
5574+
.. versionadded:: 2.2.0
5575+
5576+
Added native support for ADBC drivers
5577+
5578+
For a full list of ADBC drivers and their development status, see the `ADBC Driver
5579+
Implementation Status <https://arrow.apache.org/adbc/current/driver/status.html>`_
5580+
documentation.
5581+
5582+
Where an ADBC driver is not available or may be missing functionality,
5583+
users should opt for installing SQLAlchemy alongside their database driver library.
5584+
Examples of such drivers are `psycopg2 <https://www.psycopg.org/>`__
55715585
for PostgreSQL or `pymysql <https://github.com/PyMySQL/PyMySQL>`__ for MySQL.
55725586
For `SQLite <https://docs.python.org/3/library/sqlite3.html>`__ this is
55735587
included in Python's standard library by default.
@@ -5600,6 +5614,18 @@ In the following example, we use the `SQlite <https://www.sqlite.org/index.html>
56005614
engine. You can use a temporary SQLite database where data are stored in
56015615
"memory".
56025616

5617+
To connect using an ADBC driver you will want to install the ``adbc_driver_sqlite`` using your
5618+
package manager. Once installed, you can use the DBAPI interface provided by the ADBC driver
5619+
to connect to your database.
5620+
5621+
.. code-block:: python
5622+
5623+
import adbc_driver_sqlite.dbapi as sqlite_dbapi
5624+
5625+
# Create the connection
5626+
with sqlite_dbapi.connect("sqlite:///:memory:") as conn:
5627+
df = pd.read_sql_table("data", conn)
5628+
56035629
To connect with SQLAlchemy you use the :func:`create_engine` function to create an engine
56045630
object from database URI. You only need to create the engine once per database you are
56055631
connecting to.
@@ -5675,9 +5701,74 @@ writes ``data`` to the database in batches of 1000 rows at a time:
56755701
SQL data types
56765702
++++++++++++++
56775703

5678-
:func:`~pandas.DataFrame.to_sql` will try to map your data to an appropriate
5679-
SQL data type based on the dtype of the data. When you have columns of dtype
5680-
``object``, pandas will try to infer the data type.
5704+
Ensuring consistent data type management across SQL databases is challenging.
5705+
Not every SQL database offers the same types, and even when they do the implementation
5706+
of a given type can vary in ways that have subtle effects on how types can be
5707+
preserved.
5708+
5709+
For the best odds at preserving database types users are advised to use
5710+
ADBC drivers when available. The Arrow type system offers a wider array of
5711+
types that more closely match database types than the historical pandas/NumPy
5712+
type system. To illustrate, note this (non-exhaustive) listing of types
5713+
available in different databases and pandas backends:
5714+
5715+
+-----------------+-----------------------+----------------+---------+
5716+
|numpy/pandas |arrow |postgres |sqlite |
5717+
+=================+=======================+================+=========+
5718+
|int16/Int16 |int16 |SMALLINT |INTEGER |
5719+
+-----------------+-----------------------+----------------+---------+
5720+
|int32/Int32 |int32 |INTEGER |INTEGER |
5721+
+-----------------+-----------------------+----------------+---------+
5722+
|int64/Int64 |int64 |BIGINT |INTEGER |
5723+
+-----------------+-----------------------+----------------+---------+
5724+
|float32 |float32 |REAL |REAL |
5725+
+-----------------+-----------------------+----------------+---------+
5726+
|float64 |float64 |DOUBLE PRECISION|REAL |
5727+
+-----------------+-----------------------+----------------+---------+
5728+
|object |string |TEXT |TEXT |
5729+
+-----------------+-----------------------+----------------+---------+
5730+
|bool |``bool_`` |BOOLEAN | |
5731+
+-----------------+-----------------------+----------------+---------+
5732+
|datetime64[ns] |timestamp(us) |TIMESTAMP | |
5733+
+-----------------+-----------------------+----------------+---------+
5734+
|datetime64[ns,tz]|timestamp(us,tz) |TIMESTAMPTZ | |
5735+
+-----------------+-----------------------+----------------+---------+
5736+
| |date32 |DATE | |
5737+
+-----------------+-----------------------+----------------+---------+
5738+
| |month_day_nano_interval|INTERVAL | |
5739+
+-----------------+-----------------------+----------------+---------+
5740+
| |binary |BINARY |BLOB |
5741+
+-----------------+-----------------------+----------------+---------+
5742+
| |decimal128 |DECIMAL [#f1]_ | |
5743+
+-----------------+-----------------------+----------------+---------+
5744+
| |list |ARRAY [#f1]_ | |
5745+
+-----------------+-----------------------+----------------+---------+
5746+
| |struct |COMPOSITE TYPE | |
5747+
| | | [#f1]_ | |
5748+
+-----------------+-----------------------+----------------+---------+
5749+
5750+
.. rubric:: Footnotes
5751+
5752+
.. [#f1] Not implemented as of writing, but theoretically possible
5753+
5754+
If you are interested in preserving database types as best as possible
5755+
throughout the lifecycle of your DataFrame, users are encouraged to
5756+
leverage the ``dtype_backend="pyarrow"`` argument of :func:`~pandas.read_sql`
5757+
5758+
.. code-block:: ipython
5759+
5760+
# for roundtripping
5761+
with pg_dbapi.connect(uri) as conn:
5762+
df2 = pd.read_sql("pandas_table", conn, dtype_backend="pyarrow")
5763+
5764+
This will prevent your data from being converted to the traditional pandas/NumPy
5765+
type system, which often converts SQL types in ways that make them impossible to
5766+
round-trip.
5767+
5768+
In case an ADBC driver is not available, :func:`~pandas.DataFrame.to_sql`
5769+
will try to map your data to an appropriate SQL data type based on the dtype of
5770+
the data. When you have columns of dtype ``object``, pandas will try to infer
5771+
the data type.
56815772

56825773
You can always override the default type by specifying the desired SQL type of
56835774
any of the columns by using the ``dtype`` argument. This argument needs a
@@ -5696,7 +5787,9 @@ default ``Text`` type for string columns:
56965787

56975788
Due to the limited support for timedelta's in the different database
56985789
flavors, columns with type ``timedelta64`` will be written as integer
5699-
values as nanoseconds to the database and a warning will be raised.
5790+
values as nanoseconds to the database and a warning will be raised. The only
5791+
exception to this is when using the ADBC PostgreSQL driver in which case a
5792+
timedelta will be written to the database as an ``INTERVAL``
57005793

57015794
.. note::
57025795

@@ -5711,7 +5804,7 @@ default ``Text`` type for string columns:
57115804
Datetime data types
57125805
'''''''''''''''''''
57135806

5714-
Using SQLAlchemy, :func:`~pandas.DataFrame.to_sql` is capable of writing
5807+
Using ADBC or SQLAlchemy, :func:`~pandas.DataFrame.to_sql` is capable of writing
57155808
datetime data that is timezone naive or timezone aware. However, the resulting
57165809
data stored in the database ultimately depends on the supported data type
57175810
for datetime data of the database system being used.
@@ -5802,15 +5895,16 @@ table name and optionally a subset of columns to read.
58025895
.. note::
58035896

58045897
In order to use :func:`~pandas.read_sql_table`, you **must** have the
5805-
SQLAlchemy optional dependency installed.
5898+
ADBC driver or SQLAlchemy optional dependency installed.
58065899

58075900
.. ipython:: python
58085901
58095902
pd.read_sql_table("data", engine)
58105903
58115904
.. note::
58125905

5813-
Note that pandas infers column dtypes from query outputs, and not by looking
5906+
ADBC drivers will map database types directly back to arrow types. For other drivers
5907+
note that pandas infers column dtypes from query outputs, and not by looking
58145908
up data types in the physical database schema. For example, assume ``userid``
58155909
is an integer column in a table. Then, intuitively, ``select userid ...`` will
58165910
return integer-valued series, while ``select cast(userid as text) ...`` will

doc/source/whatsnew/v2.2.0.rst

+91
Original file line numberDiff line numberDiff line change
@@ -89,6 +89,97 @@ a Series. (:issue:`55323`)
8989
)
9090
series.list[0]
9191
92+
.. _whatsnew_220.enhancements.adbc_support:
93+
94+
ADBC Driver support in to_sql and read_sql
95+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
96+
97+
:func:`read_sql` and :meth:`~DataFrame.to_sql` now work with `Apache Arrow ADBC
98+
<https://arrow.apache.org/adbc/current/index.html>`_ drivers. Compared to
99+
traditional drivers used via SQLAlchemy, ADBC drivers should provide
100+
significant performance improvements, better type support and cleaner
101+
nullability handling.
102+
103+
.. code-block:: ipython
104+
105+
import adbc_driver_postgresql.dbapi as pg_dbapi
106+
107+
df = pd.DataFrame(
108+
[
109+
[1, 2, 3],
110+
[4, 5, 6],
111+
],
112+
columns=['a', 'b', 'c']
113+
)
114+
uri = "postgresql://postgres:postgres@localhost/postgres"
115+
with pg_dbapi.connect(uri) as conn:
116+
df.to_sql("pandas_table", conn, index=False)
117+
118+
# for roundtripping
119+
with pg_dbapi.connect(uri) as conn:
120+
df2 = pd.read_sql("pandas_table", conn)
121+
122+
The Arrow type system offers a wider array of types that can more closely match
123+
what databases like PostgreSQL can offer. To illustrate, note this (non-exhaustive)
124+
listing of types available in different databases and pandas backends:
125+
126+
+-----------------+-----------------------+----------------+---------+
127+
|numpy/pandas |arrow |postgres |sqlite |
128+
+=================+=======================+================+=========+
129+
|int16/Int16 |int16 |SMALLINT |INTEGER |
130+
+-----------------+-----------------------+----------------+---------+
131+
|int32/Int32 |int32 |INTEGER |INTEGER |
132+
+-----------------+-----------------------+----------------+---------+
133+
|int64/Int64 |int64 |BIGINT |INTEGER |
134+
+-----------------+-----------------------+----------------+---------+
135+
|float32 |float32 |REAL |REAL |
136+
+-----------------+-----------------------+----------------+---------+
137+
|float64 |float64 |DOUBLE PRECISION|REAL |
138+
+-----------------+-----------------------+----------------+---------+
139+
|object |string |TEXT |TEXT |
140+
+-----------------+-----------------------+----------------+---------+
141+
|bool |``bool_`` |BOOLEAN | |
142+
+-----------------+-----------------------+----------------+---------+
143+
|datetime64[ns] |timestamp(us) |TIMESTAMP | |
144+
+-----------------+-----------------------+----------------+---------+
145+
|datetime64[ns,tz]|timestamp(us,tz) |TIMESTAMPTZ | |
146+
+-----------------+-----------------------+----------------+---------+
147+
| |date32 |DATE | |
148+
+-----------------+-----------------------+----------------+---------+
149+
| |month_day_nano_interval|INTERVAL | |
150+
+-----------------+-----------------------+----------------+---------+
151+
| |binary |BINARY |BLOB |
152+
+-----------------+-----------------------+----------------+---------+
153+
| |decimal128 |DECIMAL [#f1]_ | |
154+
+-----------------+-----------------------+----------------+---------+
155+
| |list |ARRAY [#f1]_ | |
156+
+-----------------+-----------------------+----------------+---------+
157+
| |struct |COMPOSITE TYPE | |
158+
| | | [#f1]_ | |
159+
+-----------------+-----------------------+----------------+---------+
160+
161+
.. rubric:: Footnotes
162+
163+
.. [#f1] Not implemented as of writing, but theoretically possible
164+
165+
If you are interested in preserving database types as best as possible
166+
throughout the lifecycle of your DataFrame, users are encouraged to
167+
leverage the ``dtype_backend="pyarrow"`` argument of :func:`~pandas.read_sql`
168+
169+
.. code-block:: ipython
170+
171+
# for roundtripping
172+
with pg_dbapi.connect(uri) as conn:
173+
df2 = pd.read_sql("pandas_table", conn, dtype_backend="pyarrow")
174+
175+
This will prevent your data from being converted to the traditional pandas/NumPy
176+
type system, which often converts SQL types in ways that make them impossible to
177+
round-trip.
178+
179+
For a full list of ADBC drivers and their development status, see the `ADBC Driver
180+
Implementation Status <https://arrow.apache.org/adbc/current/driver/status.html>`_
181+
documentation.
182+
92183
.. _whatsnew_220.enhancements.other:
93184

94185
Other enhancements

environment.yml

+2
Original file line numberDiff line numberDiff line change
@@ -113,6 +113,8 @@ dependencies:
113113
- pygments # Code highlighting
114114

115115
- pip:
116+
- adbc-driver-postgresql>=0.8.0
117+
- adbc-driver-sqlite>=0.8.0
116118
- dataframe-api-compat>=0.1.7
117119
- sphinx-toggleprompt # conda-forge version has stricter pins on jinja2
118120
- typing_extensions; python_version<"3.11"

pandas/compat/_optional.py

+2
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,8 @@
1515
# Update install.rst & setup.cfg when updating versions!
1616

1717
VERSIONS = {
18+
"adbc-driver-postgresql": "0.8.0",
19+
"adbc-driver-sqlite": "0.8.0",
1820
"bs4": "4.11.2",
1921
"blosc": "1.21.3",
2022
"bottleneck": "1.3.6",

0 commit comments

Comments
 (0)