ENH: Support multi row inserts in to_sql when using the sqlite fallback #30743

simongibbons · 2020-01-06T16:17:46Z

Currently we do not support multi row inserts into sqlite databases
when to_sql is passed method="multi" - despite the documentation
suggesting that this is supported.

Adding support for this is straightforward - it only needs us
to implement a single method on the SQLiteTable class and so
this PR does just that.

closes Bug when using multi-row inserts with SQLite database #29921
tests added / passed
passes black pandas
passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry

WillAyd · 2020-01-08T22:44:52Z

@jorisvandenbossche

WillAyd · 2020-01-16T20:29:42Z

@simongibbons does this improve performance?

jorisvandenbossche

Looks good to me, thanks for the contribution!

jorisvandenbossche · 2020-01-16T20:40:37Z

doc/source/whatsnew/v1.0.0.rst

@@ -946,6 +946,7 @@ I/O
 - Bug in :func:`pandas.io.json.json_normalize` where a missing value in the location specified by `record_path` would raise a ``TypeError`` (:issue:`30148`)
 - :func:`read_excel` now accepts binary data (:issue:`15914`)
 - Bug in :meth:`read_csv` in which encoding handling was limited to just the string `utf-16` for the C engine (:issue:`24130`)
+- When writing directly to a sqlite connection :func:`to_sql` now supports the ``multi`` method (:issue:`29921`)


@simongibbons Can you move this to the v1.1 whatsnew file? (the 1.0.0 rc is released in the meantime, and doesn't take general bug fixes anymore, sorry)

simongibbons · 2020-01-16T21:08:29Z

@WillAyd

@simongibbons does this improve performance?

It can do in some cases e.g.

In [1]: import sqlite3                                                                                 

In [2]: import pandas as pd                                                                            

In [3]: db = sqlite3.connect(":memory:")                                                               

In [4]: df = pd.DataFrame({"x": range(10000), "y": range(10000)})                                      

In [5]: %timeit df.to_sql("df", db, if_exists="replace")                                               
15 ms ± 87.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [6]: %timeit df.to_sql("df", db, if_exists="replace", method="multi", chunksize=100)                
11.6 ms ± 56.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

tbh I would have been happy with closing the issue with a more informative exception. But given how easy it is to support I went with that approach.

jorisvandenbossche · 2020-01-21T10:49:25Z

doc/source/whatsnew/v1.1.0.rst

@@ -120,7 +120,7 @@ MultiIndex

 I/O
 ^^^
-
+- When writing directly to a sqlite connection :func:`to_sql` now supports the ``multi`` method (:issue:`29921`)


Final comment: can you move this bullet point to the "Otherr Enhancements" section?

Currently we do not support multi row inserts into sqlite databases when `to_sql` is passed `method="multi"` - despite the documentation suggesting that this is supported. Adding support for this is straightforward - it only needs us to implement a single method on the SQLiteTable class and so this PR does just that.

simongibbons · 2020-02-10T10:00:55Z

Ping, would be nice to get this in before there is another merge conflict with whatsnew.

jorisvandenbossche · 2020-02-11T23:01:15Z

@simongibbons Thanks a lot! (and sorry it took so long)

mxblaise · 2020-04-18T18:00:34Z

I just upgraded from 0.25.3 to 1.0.3 thinking the method keyword would solve my problem, but I still have the same error message when I put the method keyword to multi:

df.to_sql("myTable", con, if_exists='append', method="multi")
Traceback (most recent call last):
  File "<input>", line 1, in <module>
  File "C:\Appl\Python38-32\lib\site-packages\pandas\core\generic.py", line 2653, in to_sql
    sql.to_sql(
  File "C:\Appl\Python38-32\lib\site-packages\pandas\io\sql.py", line 512, in to_sql
    pandas_sql.to_sql(
  File "C:\Appl\Python38-32\lib\site-packages\pandas\io\sql.py", line 1734, in to_sql
    table.insert(chunksize, method)
  File "C:\Appl\Python38-32\lib\site-packages\pandas\io\sql.py", line 755, in insert
    exec_insert(conn, keys, chunk_iter)
  File "C:\Appl\Python38-32\lib\site-packages\pandas\io\sql.py", line 679, in _execute_insert_multi
    conn.execute(self.table.insert(data))
TypeError: insert expected 2 arguments, got 1

Instead, I now do it manually and it's a bit faster than trying to play with the chunksize parameter of .to_sql:

con.executemany('INSERT INTO myTable VALUES (?, ?, ?, ?, ?, ?)', df.to_records().tolist())
con.commit()

simongibbons force-pushed the multi-sqlite-support branch 3 times, most recently from 47bb1d7 to 9421d9f Compare January 7, 2020 22:33

WillAyd added the IO SQL to_sql, read_sql, read_sql_query label Jan 8, 2020

jorisvandenbossche reviewed Jan 16, 2020

View reviewed changes

simongibbons force-pushed the multi-sqlite-support branch from 9421d9f to 608239c Compare January 16, 2020 20:58

simongibbons requested a review from jorisvandenbossche January 21, 2020 10:14

jorisvandenbossche reviewed Jan 21, 2020

View reviewed changes

simongibbons added 3 commits January 21, 2020 10:56

Move whatsnew message to v1.1.0

c11398f

Move around whatsnew

c228128

simongibbons force-pushed the multi-sqlite-support branch from 1c18972 to c228128 Compare January 21, 2020 10:57

simongibbons requested a review from jorisvandenbossche January 21, 2020 10:57

jorisvandenbossche approved these changes Feb 11, 2020

View reviewed changes

jorisvandenbossche merged commit bec7378 into pandas-dev:master Feb 11, 2020

jorisvandenbossche added this to the 1.1 milestone Feb 11, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: Support multi row inserts in to_sql when using the sqlite fallback #30743

ENH: Support multi row inserts in to_sql when using the sqlite fallback #30743

simongibbons commented Jan 6, 2020

WillAyd commented Jan 8, 2020

WillAyd commented Jan 16, 2020

jorisvandenbossche left a comment

jorisvandenbossche Jan 16, 2020

simongibbons Jan 16, 2020

simongibbons commented Jan 16, 2020 •

edited

Loading

jorisvandenbossche Jan 21, 2020

simongibbons Jan 26, 2020

simongibbons commented Feb 10, 2020

jorisvandenbossche commented Feb 11, 2020

mxblaise commented Apr 18, 2020

ENH: Support multi row inserts in to_sql when using the sqlite fallback #30743

ENH: Support multi row inserts in to_sql when using the sqlite fallback #30743

Conversation

simongibbons commented Jan 6, 2020

WillAyd commented Jan 8, 2020

WillAyd commented Jan 16, 2020

jorisvandenbossche left a comment

Choose a reason for hiding this comment

jorisvandenbossche Jan 16, 2020

Choose a reason for hiding this comment

simongibbons Jan 16, 2020

Choose a reason for hiding this comment

simongibbons commented Jan 16, 2020 • edited Loading

jorisvandenbossche Jan 21, 2020

Choose a reason for hiding this comment

simongibbons Jan 26, 2020

Choose a reason for hiding this comment

simongibbons commented Feb 10, 2020

jorisvandenbossche commented Feb 11, 2020

mxblaise commented Apr 18, 2020

simongibbons commented Jan 16, 2020 •

edited

Loading