Skip to content

ENH: Support multi row inserts in to_sql when using the sqlite fallback #30743

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

simongibbons
Copy link
Contributor

Currently we do not support multi row inserts into sqlite databases
when to_sql is passed method="multi" - despite the documentation
suggesting that this is supported.

Adding support for this is straightforward - it only needs us
to implement a single method on the SQLiteTable class and so
this PR does just that.

@simongibbons simongibbons force-pushed the multi-sqlite-support branch 3 times, most recently from 47bb1d7 to 9421d9f Compare January 7, 2020 22:33
@WillAyd
Copy link
Member

WillAyd commented Jan 8, 2020

@jorisvandenbossche

@WillAyd WillAyd added the IO SQL to_sql, read_sql, read_sql_query label Jan 8, 2020
@WillAyd
Copy link
Member

WillAyd commented Jan 16, 2020

@simongibbons does this improve performance?

Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, thanks for the contribution!

@@ -946,6 +946,7 @@ I/O
- Bug in :func:`pandas.io.json.json_normalize` where a missing value in the location specified by `record_path` would raise a ``TypeError`` (:issue:`30148`)
- :func:`read_excel` now accepts binary data (:issue:`15914`)
- Bug in :meth:`read_csv` in which encoding handling was limited to just the string `utf-16` for the C engine (:issue:`24130`)
- When writing directly to a sqlite connection :func:`to_sql` now supports the ``multi`` method (:issue:`29921`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@simongibbons Can you move this to the v1.1 whatsnew file? (the 1.0.0 rc is released in the meantime, and doesn't take general bug fixes anymore, sorry)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@simongibbons
Copy link
Contributor Author

simongibbons commented Jan 16, 2020

@WillAyd

@simongibbons does this improve performance?

It can do in some cases e.g.

In [1]: import sqlite3                                                                                 

In [2]: import pandas as pd                                                                            

In [3]: db = sqlite3.connect(":memory:")                                                               

In [4]: df = pd.DataFrame({"x": range(10000), "y": range(10000)})                                      

In [5]: %timeit df.to_sql("df", db, if_exists="replace")                                               
15 ms ± 87.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [6]: %timeit df.to_sql("df", db, if_exists="replace", method="multi", chunksize=100)                
11.6 ms ± 56.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

tbh I would have been happy with closing the issue with a more informative exception. But given how easy it is to support I went with that approach.

@@ -120,7 +120,7 @@ MultiIndex

I/O
^^^

- When writing directly to a sqlite connection :func:`to_sql` now supports the ``multi`` method (:issue:`29921`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Final comment: can you move this bullet point to the "Otherr Enhancements" section?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Currently we do not support multi row inserts into sqlite databases
when `to_sql` is passed `method="multi"` - despite the documentation
suggesting that this is supported.

Adding support for this is straightforward - it only needs us
to implement a single method on the SQLiteTable class and so
this PR does just that.
@simongibbons
Copy link
Contributor Author

Ping, would be nice to get this in before there is another merge conflict with whatsnew.

@jorisvandenbossche jorisvandenbossche merged commit bec7378 into pandas-dev:master Feb 11, 2020
@jorisvandenbossche
Copy link
Member

@simongibbons Thanks a lot! (and sorry it took so long)

@jorisvandenbossche jorisvandenbossche added this to the 1.1 milestone Feb 11, 2020
@mxblaise
Copy link

I just upgraded from 0.25.3 to 1.0.3 thinking the method keyword would solve my problem, but I still have the same error message when I put the method keyword to multi:

df.to_sql("myTable", con, if_exists='append', method="multi")
Traceback (most recent call last):
  File "<input>", line 1, in <module>
  File "C:\Appl\Python38-32\lib\site-packages\pandas\core\generic.py", line 2653, in to_sql
    sql.to_sql(
  File "C:\Appl\Python38-32\lib\site-packages\pandas\io\sql.py", line 512, in to_sql
    pandas_sql.to_sql(
  File "C:\Appl\Python38-32\lib\site-packages\pandas\io\sql.py", line 1734, in to_sql
    table.insert(chunksize, method)
  File "C:\Appl\Python38-32\lib\site-packages\pandas\io\sql.py", line 755, in insert
    exec_insert(conn, keys, chunk_iter)
  File "C:\Appl\Python38-32\lib\site-packages\pandas\io\sql.py", line 679, in _execute_insert_multi
    conn.execute(self.table.insert(data))
TypeError: insert expected 2 arguments, got 1

Instead, I now do it manually and it's a bit faster than trying to play with the chunksize parameter of .to_sql:

con.executemany('INSERT INTO myTable VALUES (?, ?, ?, ?, ?, ?)', df.to_records().tolist())
con.commit()

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
IO SQL to_sql, read_sql, read_sql_query
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Bug when using multi-row inserts with SQLite database
4 participants