BUG: Wrong column types when query has no rows in read_sql #57037

latot · 2024-01-23T15:15:40Z

Pandas version checks

I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import sqlalchemy

engine = sqlalchemy.create_engine("sqlite://")
conn.execute(sqlalchemy.text("CREATE TABLE sample (col REAL);"))
conn.commit()

import pandas
ret = pandas.read_sql("SELECT * FROM sample;", conn)
ret["col"].dtype

#This should be float64!!
dtype('O')

Issue Description

When we request a table, and has no rows all the columns types does not match with the original one.

Expected Behavior

Keep the original column types in dtype.

Installed Versions

INSTALLED VERSIONS ------------------ commit : fd3f571 python : 3.11.7.final.0 python-bits : 64 OS : Linux OS-release : 6.1.67-gentoo-x86_64 Version : #1 SMP PREEMPT_DYNAMIC Wed Dec 13 09:49:24 -03 2023 machine : x86_64 processor : AMD Ryzen 7 5800H with Radeon Graphics byteorder : little LC_ALL : None LANG : es_CL.utf8 LOCALE : es_CL.UTF-8

pandas : 2.2.0
numpy : 1.26.3
pytz : 2023.3
dateutil : 2.8.2
setuptools : 67.8.0
pip : 23.3.2
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 4.9.3
html5lib : 1.1
pymysql : None
psycopg2 : 2.9.7
jinja2 : 3.1.2
IPython : 8.14.0
pandas_datareader : None
adbc-driver-postgresql: None
adbc-driver-sqlite : None
bs4 : 4.12.2
bottleneck : None
dataframe-api-compat : None
fastparquet : None
fsspec : None
gcsfs : None
matplotlib : 3.7.1
numba : None
numexpr : None
odfpy : None
openpyxl : 3.1.2
pandas_gbq : None
pyarrow : 15.0.0
pyreadstat : None
python-calamine : None
pyxlsb : None
s3fs : None
scipy : 1.11.1
sqlalchemy : 2.0.23
tables : None
tabulate : 0.9.0
xarray : None
xlrd : 2.0.1
zstandard : None
tzdata : 2023.3
qtpy : 2.4.0
pyqt5 : None

The text was updated successfully, but these errors were encountered:

regieleki · 2024-04-03T02:43:21Z

take

WillAyd · 2024-04-15T19:42:50Z

Thanks for opening the issue. Unfortunately this is not possible with the combination of ODBC / sqlalchemy that you have in your example. Both of those libraries work by creating Python objects from database objects, so it is easy for data types to be lost in translation, especially when you have missing data.

In pandas 2.2 we introduced ADBC drivers, which solve this issue, give better performance, etc...I would generally advise using ADBC drivers whenever you get the chance https://pandas.pydata.org/docs/whatsnew/v2.2.0.html#adbc-driver-support-in-to-sql-and-read-sql

import pandas
from adbc_driver_sqlite import dbapi

uri = "file::memory:"
with dbapi.connect(uri) as conn:
    with conn.cursor() as cur:
        cur.execute("CREATE TABLE IF NOT EXISTS sample (col REAL);")
    conn.commit()
    ret = pandas.read_sql("SELECT * FROM sample;", conn, dtype_backend="pyarrow")
ret["col"].dtype

Unfortunately the above code returns int64 with adbc_driver_sqlite version 0.11. I've opened an issue for that upstream apache/arrow-adbc#1724

Whenever that gets fixed upstream it will be the best path forward

WillAyd · 2024-04-15T23:18:17Z

Actually if you read through the linked issue above it seems that this is also just a sqlite limitation. So not sure anything can be done to really get what you want from the OP, though if you use ADBC in combination with a more strongly typed database your odds for preserving types improves

latot added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Jan 23, 2024

github-actions bot assigned regieleki Apr 3, 2024

WillAyd closed this as completed Apr 15, 2024

WillAyd added Usage Question IO SQL to_sql, read_sql, read_sql_query and removed Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Apr 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: Wrong column types when query has no rows in read_sql #57037

BUG: Wrong column types when query has no rows in read_sql #57037

latot commented Jan 23, 2024

regieleki commented Apr 3, 2024

WillAyd commented Apr 15, 2024

WillAyd commented Apr 15, 2024

BUG: Wrong column types when query has no rows in read_sql #57037

BUG: Wrong column types when query has no rows in read_sql #57037

Comments

latot commented Jan 23, 2024

Pandas version checks

Reproducible Example

Issue Description

Expected Behavior

Installed Versions

regieleki commented Apr 3, 2024

WillAyd commented Apr 15, 2024

WillAyd commented Apr 15, 2024