@@ -5565,9 +5565,23 @@ SQL queries
5565
5565
-----------
5566
5566
5567
5567
The :mod: `pandas.io.sql ` module provides a collection of query wrappers to both
5568
- facilitate data retrieval and to reduce dependency on DB-specific API. Database abstraction
5569
- is provided by SQLAlchemy if installed. In addition you will need a driver library for
5570
- your database. Examples of such drivers are `psycopg2 <https://www.psycopg.org/ >`__
5568
+ facilitate data retrieval and to reduce dependency on DB-specific API.
5569
+
5570
+ Where available, users may first want to opt for `Apache Arrow ADBC
5571
+ <https://arrow.apache.org/adbc/current/index.html> `_ drivers. These drivers
5572
+ should provide the best performance, null handling, and type detection.
5573
+
5574
+ .. versionadded :: 2.2.0
5575
+
5576
+ Added native support for ADBC drivers
5577
+
5578
+ For a full list of ADBC drivers and their development status, see the `ADBC Driver
5579
+ Implementation Status <https://arrow.apache.org/adbc/current/driver/status.html> `_
5580
+ documentation.
5581
+
5582
+ Where an ADBC driver is not available or may be missing functionality,
5583
+ users should opt for installing SQLAlchemy alongside their database driver library.
5584
+ Examples of such drivers are `psycopg2 <https://www.psycopg.org/ >`__
5571
5585
for PostgreSQL or `pymysql <https://github.com/PyMySQL/PyMySQL >`__ for MySQL.
5572
5586
For `SQLite <https://docs.python.org/3/library/sqlite3.html >`__ this is
5573
5587
included in Python's standard library by default.
@@ -5600,6 +5614,18 @@ In the following example, we use the `SQlite <https://www.sqlite.org/index.html>
5600
5614
engine. You can use a temporary SQLite database where data are stored in
5601
5615
"memory".
5602
5616
5617
+ To connect using an ADBC driver you will want to install the ``adbc_driver_sqlite `` using your
5618
+ package manager. Once installed, you can use the DBAPI interface provided by the ADBC driver
5619
+ to connect to your database.
5620
+
5621
+ .. code-block :: python
5622
+
5623
+ import adbc_driver_sqlite.dbapi as sqlite_dbapi
5624
+
5625
+ # Create the connection
5626
+ with sqlite_dbapi.connect(" sqlite:///:memory:" ) as conn:
5627
+ df = pd.read_sql_table(" data" , conn)
5628
+
5603
5629
To connect with SQLAlchemy you use the :func: `create_engine ` function to create an engine
5604
5630
object from database URI. You only need to create the engine once per database you are
5605
5631
connecting to.
@@ -5675,9 +5701,74 @@ writes ``data`` to the database in batches of 1000 rows at a time:
5675
5701
SQL data types
5676
5702
++++++++++++++
5677
5703
5678
- :func: `~pandas.DataFrame.to_sql ` will try to map your data to an appropriate
5679
- SQL data type based on the dtype of the data. When you have columns of dtype
5680
- ``object ``, pandas will try to infer the data type.
5704
+ Ensuring consistent data type management across SQL databases is challenging.
5705
+ Not every SQL database offers the same types, and even when they do the implementation
5706
+ of a given type can vary in ways that have subtle effects on how types can be
5707
+ preserved.
5708
+
5709
+ For the best odds at preserving database types users are advised to use
5710
+ ADBC drivers when available. The Arrow type system offers a wider array of
5711
+ types that more closely match database types than the historical pandas/NumPy
5712
+ type system. To illustrate, note this (non-exhaustive) listing of types
5713
+ available in different databases and pandas backends:
5714
+
5715
+ +-----------------+-----------------------+----------------+---------+
5716
+ | numpy/pandas |arrow |postgres |sqlite |
5717
+ +=================+=======================+================+=========+
5718
+ | int16/Int16 |int16 |SMALLINT |INTEGER |
5719
+ +-----------------+-----------------------+----------------+---------+
5720
+ | int32/Int32 |int32 |INTEGER |INTEGER |
5721
+ +-----------------+-----------------------+----------------+---------+
5722
+ | int64/Int64 |int64 |BIGINT |INTEGER |
5723
+ +-----------------+-----------------------+----------------+---------+
5724
+ | float32 |float32 |REAL |REAL |
5725
+ +-----------------+-----------------------+----------------+---------+
5726
+ | float64 |float64 |DOUBLE PRECISION|REAL |
5727
+ +-----------------+-----------------------+----------------+---------+
5728
+ | object |string |TEXT |TEXT |
5729
+ +-----------------+-----------------------+----------------+---------+
5730
+ | bool |``bool_`` |BOOLEAN | |
5731
+ +-----------------+-----------------------+----------------+---------+
5732
+ | datetime64[ns] |timestamp(us) |TIMESTAMP | |
5733
+ +-----------------+-----------------------+----------------+---------+
5734
+ | datetime64[ns,tz]|timestamp(us,tz) |TIMESTAMPTZ | |
5735
+ +-----------------+-----------------------+----------------+---------+
5736
+ | |date32 |DATE | |
5737
+ +-----------------+-----------------------+----------------+---------+
5738
+ | |month_day_nano_interval|INTERVAL | |
5739
+ +-----------------+-----------------------+----------------+---------+
5740
+ | |binary |BINARY |BLOB |
5741
+ +-----------------+-----------------------+----------------+---------+
5742
+ | |decimal128 |DECIMAL [#f1]_ | |
5743
+ +-----------------+-----------------------+----------------+---------+
5744
+ | |list |ARRAY [#f1]_ | |
5745
+ +-----------------+-----------------------+----------------+---------+
5746
+ | |struct |COMPOSITE TYPE | |
5747
+ | | | [#f1 ]_ | |
5748
+ +-----------------+-----------------------+----------------+---------+
5749
+
5750
+ .. rubric :: Footnotes
5751
+
5752
+ .. [#f1 ] Not implemented as of writing, but theoretically possible
5753
+
5754
+ If you are interested in preserving database types as best as possible
5755
+ throughout the lifecycle of your DataFrame, users are encouraged to
5756
+ leverage the ``dtype_backend="pyarrow" `` argument of :func: `~pandas.read_sql `
5757
+
5758
+ .. code-block :: ipython
5759
+
5760
+ # for roundtripping
5761
+ with pg_dbapi.connect(uri) as conn:
5762
+ df2 = pd.read_sql("pandas_table", conn, dtype_backend="pyarrow")
5763
+
5764
+ This will prevent your data from being converted to the traditional pandas/NumPy
5765
+ type system, which often converts SQL types in ways that make them impossible to
5766
+ round-trip.
5767
+
5768
+ In case an ADBC driver is not available, :func: `~pandas.DataFrame.to_sql `
5769
+ will try to map your data to an appropriate SQL data type based on the dtype of
5770
+ the data. When you have columns of dtype ``object ``, pandas will try to infer
5771
+ the data type.
5681
5772
5682
5773
You can always override the default type by specifying the desired SQL type of
5683
5774
any of the columns by using the ``dtype `` argument. This argument needs a
@@ -5696,7 +5787,9 @@ default ``Text`` type for string columns:
5696
5787
5697
5788
Due to the limited support for timedelta's in the different database
5698
5789
flavors, columns with type ``timedelta64 `` will be written as integer
5699
- values as nanoseconds to the database and a warning will be raised.
5790
+ values as nanoseconds to the database and a warning will be raised. The only
5791
+ exception to this is when using the ADBC PostgreSQL driver in which case a
5792
+ timedelta will be written to the database as an ``INTERVAL ``
5700
5793
5701
5794
.. note ::
5702
5795
@@ -5711,7 +5804,7 @@ default ``Text`` type for string columns:
5711
5804
Datetime data types
5712
5805
'''''''''''''''''''
5713
5806
5714
- Using SQLAlchemy, :func: `~pandas.DataFrame.to_sql ` is capable of writing
5807
+ Using ADBC or SQLAlchemy, :func: `~pandas.DataFrame.to_sql ` is capable of writing
5715
5808
datetime data that is timezone naive or timezone aware. However, the resulting
5716
5809
data stored in the database ultimately depends on the supported data type
5717
5810
for datetime data of the database system being used.
@@ -5802,15 +5895,16 @@ table name and optionally a subset of columns to read.
5802
5895
.. note ::
5803
5896
5804
5897
In order to use :func: `~pandas.read_sql_table `, you **must ** have the
5805
- SQLAlchemy optional dependency installed.
5898
+ ADBC driver or SQLAlchemy optional dependency installed.
5806
5899
5807
5900
.. ipython :: python
5808
5901
5809
5902
pd.read_sql_table(" data" , engine)
5810
5903
5811
5904
.. note ::
5812
5905
5813
- Note that pandas infers column dtypes from query outputs, and not by looking
5906
+ ADBC drivers will map database types directly back to arrow types. For other drivers
5907
+ note that pandas infers column dtypes from query outputs, and not by looking
5814
5908
up data types in the physical database schema. For example, assume ``userid ``
5815
5909
is an integer column in a table. Then, intuitively, ``select userid ... `` will
5816
5910
return integer-valued series, while ``select cast(userid as text) ... `` will
0 commit comments