-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Add SQL Support for ADBC Drivers #53869
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 72 commits
4f2b760
a4ebbb5
b2cd149
512bd00
f49115c
a8512b5
c1c68ef
3d7fb15
1093bc8
926e567
093dd86
fcc21a8
88642f7
156096d
dd26edb
4d8a233
5238e69
51c6c98
39b462b
428c4f7
84d95bb
a4d5b31
21b35f6
e709d52
6077fa9
5bba566
236e12b
b35374c
8d814e1
cfac2c7
47caaf1
a22e5d1
c51b7f4
7f5e6ac
9ee6255
a8b645f
902df4f
90ca2cb
d753c3c
d469e24
2bc11a1
f205f90
2755100
3577a59
c5bf7f8
98d22ce
4f72010
7223e63
c2cd90a
de65ec0
7645727
3dc914c
6dbaae5
3bf550c
492301f
fc463a4
cc72ecd
f5fd529
1207bc4
e8d93c7
3eed897
adef2f2
67101fd
ea5dcb9
fb38411
a0bed67
150e267
1e77f2b
97ed24f
7dc07da
52ee8d3
1de8488
21edaea
2d077e9
accbd49
64b63bd
f84f63a
391d045
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -335,7 +335,8 @@ lxml 4.9.2 xml XML parser for read | |
SQL databases | ||
^^^^^^^^^^^^^ | ||
|
||
Installable with ``pip install "pandas[postgresql, mysql, sql-other]"``. | ||
Traditional drivers are installable with ``pip install "pandas[postgresql, mysql, sql-other]"``. ADBC drivers | ||
must be installed separately. | ||
|
||
========================= ================== =============== ============================================================= | ||
Dependency Minimum Version pip extra Notes | ||
|
@@ -345,6 +346,8 @@ SQLAlchemy 2.0.0 postgresql, SQL support for dat | |
sql-other | ||
psycopg2 2.9.6 postgresql PostgreSQL engine for sqlalchemy | ||
pymysql 1.0.2 mysql MySQL engine for sqlalchemy | ||
adbc-driver-postgresql 0.8.0 ADBC Driver for PostgreSQL | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this is in the |
||
adbc-driver-sqlite 0.8.0 ADBC Driver for SQLite | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this is in the |
||
========================= ================== =============== ============================================================= | ||
|
||
Other data sources | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -89,6 +89,97 @@ a Series. (:issue:`55323`) | |
) | ||
series.list[0] | ||
|
||
.. _whatsnew_220.enhancements.adbc_support: | ||
|
||
ADBC Driver support in to_sql and read_sql | ||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||
|
||
:func:`read_sql` and :meth:`~DataFrame.to_sql` now work with `Apache Arrow ADBC | ||
<https://arrow.apache.org/adbc/current/index.html>`_ drivers. Compared to | ||
traditional drivers used via SQLAlchemy, ADBC drivers should provide | ||
significant performance improvements, better type support and cleaner | ||
nullability handling. | ||
|
||
.. code-block:: ipython | ||
|
||
import adbc_driver_postgresql.dbapi as pg_dbapi | ||
|
||
df = pd.DataFrame( | ||
[ | ||
[1, 2, 3], | ||
[4, 5, 6], | ||
], | ||
columns=['a', 'b', 'c'] | ||
) | ||
uri = "postgresql://postgres:postgres@localhost/postgres" | ||
with pg_dbapi.connect(uri) as conn: | ||
df.to_sql("pandas_table", conn, index=False) | ||
|
||
# for roundtripping | ||
with pg_dbapi.connect(uri) as conn: | ||
df2 = pd.read_sql("pandas_table", conn) | ||
|
||
The Arrow type system offers a wider array of types that can more closely match | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think this is important enough of a point to make in the changelog, but should also probably put in io.rst. Just wasn't sure how to best structure that yet There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. +1 to include in io.rst. I think it might be good to add a separate section in io.rst to talk about type mapping (if there isn't one already) and also include sqlalchemy type mapping |
||
what databases like PostgreSQL can offer. To illustrate, note this (non-exhaustive) | ||
listing of types available in different databases and pandas backends: | ||
|
||
+-----------------+-----------------------+----------------+---------+ | ||
|numpy/pandas |arrow |postgres |sqlite | | ||
+=================+=======================+================+=========+ | ||
|int16/Int16 |int16 |SMALLINT |INTEGER | | ||
+-----------------+-----------------------+----------------+---------+ | ||
|int32/Int32 |int32 |INTEGER |INTEGER | | ||
+-----------------+-----------------------+----------------+---------+ | ||
|int64/Int64 |int64 |BIGINT |INTEGER | | ||
+-----------------+-----------------------+----------------+---------+ | ||
|float32 |float32 |REAL |REAL | | ||
+-----------------+-----------------------+----------------+---------+ | ||
|float64 |float64 |DOUBLE PRECISION|REAL | | ||
+-----------------+-----------------------+----------------+---------+ | ||
|object |string |TEXT |TEXT | | ||
+-----------------+-----------------------+----------------+---------+ | ||
|bool |``bool_`` |BOOLEAN | | | ||
+-----------------+-----------------------+----------------+---------+ | ||
|datetime64[ns] |timestamp(us) |TIMESTAMP | | | ||
+-----------------+-----------------------+----------------+---------+ | ||
|datetime64[ns,tz]|timestamp(us,tz) |TIMESTAMPTZ | | | ||
+-----------------+-----------------------+----------------+---------+ | ||
| |date32 |DATE | | | ||
+-----------------+-----------------------+----------------+---------+ | ||
| |month_day_nano_interval|INTERVAL | | | ||
+-----------------+-----------------------+----------------+---------+ | ||
| |binary |BINARY |BLOB | | ||
+-----------------+-----------------------+----------------+---------+ | ||
| |decimal128 |DECIMAL [#f1]_ | | | ||
+-----------------+-----------------------+----------------+---------+ | ||
| |list |ARRAY [#f1]_ | | | ||
+-----------------+-----------------------+----------------+---------+ | ||
| |struct |COMPOSITE TYPE | | | ||
| | | [#f1]_ | | | ||
+-----------------+-----------------------+----------------+---------+ | ||
|
||
.. rubric:: Footnotes | ||
|
||
.. [#f1] Not implemented as of writing, but theoretically possible | ||
|
||
If you are interested in preserving database types as best as possible | ||
throughout the lifecycle of your DataFrame, users are encouraged to | ||
leverage the ``dtype_backend="pyarrow"`` argument of :func:`~pandas.read_sql` | ||
|
||
.. code-block:: ipython | ||
|
||
# for roundtripping | ||
with pg_dbapi.connect(uri) as conn: | ||
df2 = pd.read_sql("pandas_table", conn, dtype_backend="pyarrow") | ||
|
||
This will prevent your data from being converted to the traditional pandas/NumPy | ||
type system, which often converts SQL types in ways that make them impossible to | ||
round-trip. | ||
|
||
For a full list of ADBC drivers and their development status, see the `ADBC Driver | ||
Implementation Status <https://arrow.apache.org/adbc/current/driver/status.html>`_ | ||
documentation. | ||
|
||
.. _whatsnew_220.enhancements.other: | ||
|
||
Other enhancements | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is no longer anymore since you added them to the
postgresql
andsql-other
extra in the pyproject.toml