BUG: custom callable in to_sql stopped working with pandas 1.4.0 #46596

frantakalab · 2022-04-01T13:56:47Z

Pandas version checks

I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import csv

import pandas as pd

from io import StringIO
from sqlalchemy import create_engine

d = {'col1': [1, 2], 'col2': [3, 4]}
df = pd.DataFrame(data=d)

# define custom insert method for to_sql as written in documentation https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html?highlight=synchronize#insertion-method

def psql_insert_copy(table, conn, keys, data_iter):
    """
    Execute SQL statement inserting data

    Parameters
    ----------
    table : pandas.io.sql.SQLTable
    conn : sqlalchemy.engine.Engine or sqlalchemy.engine.Connection
    keys : list of str
        Column names
    data_iter : Iterable that iterates the values to be inserted
    """
    # gets a DBAPI connection that can provide a cursor
    dbapi_conn = conn.connection
    with dbapi_conn.cursor() as cur:
        s_buf = StringIO()
        writer = csv.writer(s_buf)
        writer.writerows(data_iter)
        s_buf.seek(0)

        columns = ', '.join(['"{}"'.format(k) for k in keys])
        if table.schema:
            table_name = '{}.{}'.format(table.schema, table.name)
        else:
            table_name = table.name

        sql = 'COPY {} ({}) FROM STDIN WITH CSV'.format(
            table_name, columns)
        cur.copy_expert(sql=sql, file=s_buf)

# you need to have PostgreSQL running and give correct credentials here to reproduce the issue
engine = create_engine(f'postgresql://{'user'}:{'password'}@{'host'}/{'db'}')

with engine.connect() as conn:
     df.to_sql('target_table', 
               con=conn,
               schema='public',
               if_exists='append',
               index=False,
               method=psql_insert_copy)

Issue Description

When I use custom method for PostgreSQL in DataFrame.to_sqlaccording to the docummentation , the table is created with correct column names and numeric types (if it doesn't exist) but no values are imported (table is empty), no warning or error is raised in pandas nor in the database.

It works in pandas~=1.3.5 and breaks when 1.4.0 or 1.4.1 is installed (keeping sqlalchemy and psycopg2 on same versions). I am suspecting something related to the change that to_sql can now return an Int value instead or None, but couldn't find how that is related or how it can be fixed.

Expected Behavior

Full df is imported into the table including its values.

Installed Versions

INSTALLED VERSIONS

commit : 06d2301
python : 3.8.10.final.0
python-bits : 64
OS : Linux
OS-release : 5.13.0-39-generic
Version : #44~20.04.1-Ubuntu SMP Thu Mar 24 16:43:35 UTC 2022
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8

pandas : 1.4.1
numpy : 1.22.3
pytz : 2022.1
dateutil : 2.8.2
pip : 20.0.2
setuptools : 45.2.0
Cython : None
pytest : 7.0.1
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 4.8.0
html5lib : None
pymysql : None
psycopg2 : 2.9.3
jinja2 : 2.11.2
IPython : 7.19.0
pandas_datareader: None
bs4 : 4.9.3
bottleneck : 1.3.2
fastparquet : None
fsspec : None
gcsfs : None
matplotlib : 3.4.3
numba : None
numexpr : 2.7.2
odfpy : None
openpyxl : 3.0.9
pandas_gbq : None
pyarrow : None
pyreadstat : None
pyxlsb : 1.0.8
s3fs : None
scipy : 1.6.3
sqlalchemy : 1.4.34
tables : None
tabulate : None
xarray : None
xlrd : 1.2.0
xlwt : None
zstandard : None

The text was updated successfully, but these errors were encountered:

simonjayhawkins · 2022-06-22T12:00:46Z

moving to 1.4.4

mroeschke · 2022-06-22T22:03:08Z

Sorry for the slow reply, but could you provide more details regarding your testing setup with Postgres?

Even with the introduction of to_sql returning row counts in 1.4, this specific example is being in our CI using a Postgres container in Github Actions

pandas/pandas/tests/io/test_sql.py

Lines 623 to 649 in d43d6e2

    
           def test_copy_from_callable_insertion_method(conn, request): 
        
               # GH 8953 
        
               # Example in io.rst found under _io.sql.method 
        
               # not available in sqlite, mysql 
        
               def psql_insert_copy(table, conn, keys, data_iter): 
        
                   # gets a DBAPI connection that can provide a cursor 
        
                   dbapi_conn = conn.connection 
        
                   with dbapi_conn.cursor() as cur: 
        
                       s_buf = StringIO() 
        
                       writer = csv.writer(s_buf) 
        
                       writer.writerows(data_iter) 
        
                       s_buf.seek(0) 
        
                       columns = ", ".join([f'"{k}"' for k in keys]) 
        
                       if table.schema: 
        
                           table_name = f"{table.schema}.{table.name}" 
        
                       else: 
        
                           table_name = table.name 
        
                       sql_query = f"COPY {table_name} ({columns}) FROM STDIN WITH CSV" 
        
                       cur.copy_expert(sql=sql_query, file=s_buf) 
        
               conn = request.getfixturevalue(conn) 
        
               expected = DataFrame({"col1": [1, 2], "col2": [0.1, 0.2], "col3": ["a", "n"]}) 
        
               expected.to_sql("test_frame", conn, index=False, method=psql_insert_copy) 
        
               result = sql.read_sql_table("test_frame", conn) 
        
               tm.assert_frame_equal(result, expected)

leyan2015 · 2022-06-24T13:56:46Z

I had the same issue after I upgraded to latest Anaconda distribution ( Python 3.9, linux 64-Bit (x86) ).
pd._to_sql() pd.read_sql() would all fail.

I did some digging around and discovered that it is the special character "@" in my password mess up the log in. Looks like the parser in the library has a bug. Maybe come from the sqlAlchemy side.

The same user name password would work fine using sqlcmd:
./sqlcmd -S serverName -U yanni2006 -P 'XXX@ABC' -d dbName -Q "select * from db.schema.test"

Yet it fails in python:

import sqlalchemy
import pandas as pd
uid = 'yanni2006 '
password = 'XXX@ABC'
server = 'serverName'
db= 'dbName '
conn_str = 'mssql+pyodbc://{}:{}@{}/{}?driver=ODBC+Driver+17+for+SQL+Server'.format(uid, password, server, db)
sqlEngine = sqlalchemy.create_engine(conn_str)
conn= sqlEngine.connect()

The conn would fail here:
sqlalchemy.exc.OperationalError: (pyodbc.OperationalError) ('HYT00', '[HYT00] [Microsoft][ODBC Driver 17 for SQL Server]Login timeout expired (0) (SQLDriverConnect)')

Obviously if I do
with sqlEngine.begin() as trans:
df= pd.read_sql(con=trans, sql="select * from db.schema.test"

would fail too.

After remove the "@ " from my password, everything worked fine. Before a fix comes out, you can get around by change your password.

also, to_sql() function return -1 even though the operation is successful. the number of rows affected obviously not -1

mroeschke · 2022-06-24T17:42:25Z

also, to_sql() function return -1 even though the operation is successful. the number of rows affected obviously not -1

The returned number of rows is the sum of the reported rows affected by whatever underlying ORM library used.

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_sql.html

The number of returned rows affected is the sum of the rowcount attribute of sqlite3.Cursor or SQLAlchemy connectable which may not reflect the exact number of written rows as stipulated in the sqlite3 or SQLAlchemy.

mroeschke · 2022-08-19T18:58:02Z

Thanks for the report, but until we receive more context around the failure scenario this code is being tested as mentioned in #46596 (comment). Closing, but we can reopen if we get more information.

utsavjha · 2022-11-07T16:33:58Z

I can confirm the same issue being experienced by me. Using with Postgres.
Here is the debugging performed on my end.

Environment: Docker 4.12, Python v3.10-bullseye source image
Database: Postgres (v15)
My venv packages: pandas, psycopg2==2.9.5 and SQLAlchemy==1.4.43

Affected Pandas Version: >=v1.3.6. All distributions after v1.3.5 are unable to write. I tested each version from v1.3.5 to v1.5.1, same behavior.

My invocation looks like this, where the callable is referenced from official Pandas docs:

some_pandas_dataframe.to_sql(
                name=pg_table_name,
                schema=pg_schema_name,
                if_exists="append",
                index=False,
                chunksize=pg_export_chunk_size,
                con=pg_conn,
                method=PostgresConnector.psql_insert_copy,
            )

Funny behaviour that I did experience during my testing:

there is no error thrown in the code. Tests succeed too.
Since using Postgres v15, one can track the last run COPY queries using the pg_stat_progress_copy: This view was empty / did not contain the insertion query like it should. Pandas v1.3.5 successfully logs the query there and greater versions do NOT.
I was tracking the DB File Stats - there is an uptick in the tempDB files during the export. Yet, there is no data recovered from the database: I saw my WAL expanding viciously only DURING the export. After the exports completed in code, the expected tables were still empty.

vroomzel · 2022-12-19T15:01:30Z

I'm having the same issue with pandas==1.5.0 and sqlalchemy== 1.4.36. And our service accounts that use pandas 1.3.4 and sqlalchemy 1.4.22 work just fine.

frantakalab added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Apr 1, 2022

simonjayhawkins added this to the 1.4.3 milestone Apr 3, 2022

simonjayhawkins added Regression Functionality that used to work in a prior pandas version IO SQL to_sql, read_sql, read_sql_query labels Apr 3, 2022

simonjayhawkins modified the milestones: 1.4.3, 1.4.4 Jun 22, 2022

mroeschke added Needs Info Clarification about behavior needed to assess issue and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Jun 22, 2022

mroeschke closed this as completed Aug 19, 2022

mroeschke removed this from the 1.4.4 milestone Aug 19, 2022

mroeschke added Regression Functionality that used to work in a prior pandas version and removed Regression Functionality that used to work in a prior pandas version labels Aug 19, 2022

vroomzel mentioned this issue Dec 19, 2022

BUG: to_sql(method=psql_insert_copy) not working with Connection #48136

Closed

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: custom callable in to_sql stopped working with pandas 1.4.0 #46596

BUG: custom callable in to_sql stopped working with pandas 1.4.0 #46596

frantakalab commented Apr 1, 2022

simonjayhawkins commented Jun 22, 2022

mroeschke commented Jun 22, 2022

leyan2015 commented Jun 24, 2022 •

edited

Loading

mroeschke commented Jun 24, 2022 •

edited

Loading

mroeschke commented Aug 19, 2022

utsavjha commented Nov 7, 2022 •

edited

Loading

vroomzel commented Dec 19, 2022

BUG: custom callable in to_sql stopped working with pandas 1.4.0 #46596

BUG: custom callable in to_sql stopped working with pandas 1.4.0 #46596

Comments

frantakalab commented Apr 1, 2022

Pandas version checks

Reproducible Example

Issue Description

Expected Behavior

Installed Versions

INSTALLED VERSIONS

simonjayhawkins commented Jun 22, 2022

mroeschke commented Jun 22, 2022

leyan2015 commented Jun 24, 2022 • edited Loading

mroeschke commented Jun 24, 2022 • edited Loading

mroeschke commented Aug 19, 2022

utsavjha commented Nov 7, 2022 • edited Loading

vroomzel commented Dec 19, 2022

leyan2015 commented Jun 24, 2022 •

edited

Loading

mroeschke commented Jun 24, 2022 •

edited

Loading

utsavjha commented Nov 7, 2022 •

edited

Loading