Skip to content

BUG: Wrong column types when query has no rows in read_sql #57037

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
2 of 3 tasks
latot opened this issue Jan 23, 2024 · 3 comments
Closed
2 of 3 tasks

BUG: Wrong column types when query has no rows in read_sql #57037

latot opened this issue Jan 23, 2024 · 3 comments
Assignees
Labels
IO SQL to_sql, read_sql, read_sql_query Usage Question

Comments

@latot
Copy link

latot commented Jan 23, 2024

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import sqlalchemy

engine = sqlalchemy.create_engine("sqlite://")
conn.execute(sqlalchemy.text("CREATE TABLE sample (col REAL);"))
conn.commit()

import pandas
ret = pandas.read_sql("SELECT * FROM sample;", conn)
ret["col"].dtype

#This should be float64!!
dtype('O')

Issue Description

When we request a table, and has no rows all the columns types does not match with the original one.

Expected Behavior

Keep the original column types in dtype.

Installed Versions

INSTALLED VERSIONS ------------------ commit : fd3f571 python : 3.11.7.final.0 python-bits : 64 OS : Linux OS-release : 6.1.67-gentoo-x86_64 Version : #1 SMP PREEMPT_DYNAMIC Wed Dec 13 09:49:24 -03 2023 machine : x86_64 processor : AMD Ryzen 7 5800H with Radeon Graphics byteorder : little LC_ALL : None LANG : es_CL.utf8 LOCALE : es_CL.UTF-8

pandas : 2.2.0
numpy : 1.26.3
pytz : 2023.3
dateutil : 2.8.2
setuptools : 67.8.0
pip : 23.3.2
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 4.9.3
html5lib : 1.1
pymysql : None
psycopg2 : 2.9.7
jinja2 : 3.1.2
IPython : 8.14.0
pandas_datareader : None
adbc-driver-postgresql: None
adbc-driver-sqlite : None
bs4 : 4.12.2
bottleneck : None
dataframe-api-compat : None
fastparquet : None
fsspec : None
gcsfs : None
matplotlib : 3.7.1
numba : None
numexpr : None
odfpy : None
openpyxl : 3.1.2
pandas_gbq : None
pyarrow : 15.0.0
pyreadstat : None
python-calamine : None
pyxlsb : None
s3fs : None
scipy : 1.11.1
sqlalchemy : 2.0.23
tables : None
tabulate : 0.9.0
xarray : None
xlrd : 2.0.1
zstandard : None
tzdata : 2023.3
qtpy : 2.4.0
pyqt5 : None

@latot latot added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Jan 23, 2024
@regieleki
Copy link

take

@WillAyd
Copy link
Member

WillAyd commented Apr 15, 2024

Thanks for opening the issue. Unfortunately this is not possible with the combination of ODBC / sqlalchemy that you have in your example. Both of those libraries work by creating Python objects from database objects, so it is easy for data types to be lost in translation, especially when you have missing data.

In pandas 2.2 we introduced ADBC drivers, which solve this issue, give better performance, etc...I would generally advise using ADBC drivers whenever you get the chance https://pandas.pydata.org/docs/whatsnew/v2.2.0.html#adbc-driver-support-in-to-sql-and-read-sql

import pandas
from adbc_driver_sqlite import dbapi

uri = "file::memory:"
with dbapi.connect(uri) as conn:
    with conn.cursor() as cur:
        cur.execute("CREATE TABLE IF NOT EXISTS sample (col REAL);")
    conn.commit()
    ret = pandas.read_sql("SELECT * FROM sample;", conn, dtype_backend="pyarrow")
ret["col"].dtype

Unfortunately the above code returns int64 with adbc_driver_sqlite version 0.11. I've opened an issue for that upstream apache/arrow-adbc#1724

Whenever that gets fixed upstream it will be the best path forward

@WillAyd
Copy link
Member

WillAyd commented Apr 15, 2024

Actually if you read through the linked issue above it seems that this is also just a sqlite limitation. So not sure anything can be done to really get what you want from the OP, though if you use ADBC in combination with a more strongly typed database your odds for preserving types improves

@WillAyd WillAyd closed this as completed Apr 15, 2024
@WillAyd WillAyd added Usage Question IO SQL to_sql, read_sql, read_sql_query and removed Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Apr 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
IO SQL to_sql, read_sql, read_sql_query Usage Question
Projects
None yet
Development

No branches or pull requests

3 participants