-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
ENH: Support multi row inserts in to_sql when using the sqlite fallback #30743
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: Support multi row inserts in to_sql when using the sqlite fallback #30743
Conversation
47bb1d7
to
9421d9f
Compare
@simongibbons does this improve performance? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, thanks for the contribution!
doc/source/whatsnew/v1.0.0.rst
Outdated
@@ -946,6 +946,7 @@ I/O | |||
- Bug in :func:`pandas.io.json.json_normalize` where a missing value in the location specified by `record_path` would raise a ``TypeError`` (:issue:`30148`) | |||
- :func:`read_excel` now accepts binary data (:issue:`15914`) | |||
- Bug in :meth:`read_csv` in which encoding handling was limited to just the string `utf-16` for the C engine (:issue:`24130`) | |||
- When writing directly to a sqlite connection :func:`to_sql` now supports the ``multi`` method (:issue:`29921`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@simongibbons Can you move this to the v1.1 whatsnew file? (the 1.0.0 rc is released in the meantime, and doesn't take general bug fixes anymore, sorry)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
9421d9f
to
608239c
Compare
It can do in some cases e.g. In [1]: import sqlite3
In [2]: import pandas as pd
In [3]: db = sqlite3.connect(":memory:")
In [4]: df = pd.DataFrame({"x": range(10000), "y": range(10000)})
In [5]: %timeit df.to_sql("df", db, if_exists="replace")
15 ms ± 87.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [6]: %timeit df.to_sql("df", db, if_exists="replace", method="multi", chunksize=100)
11.6 ms ± 56.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) tbh I would have been happy with closing the issue with a more informative exception. But given how easy it is to support I went with that approach. |
doc/source/whatsnew/v1.1.0.rst
Outdated
@@ -120,7 +120,7 @@ MultiIndex | |||
|
|||
I/O | |||
^^^ | |||
|
|||
- When writing directly to a sqlite connection :func:`to_sql` now supports the ``multi`` method (:issue:`29921`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Final comment: can you move this bullet point to the "Otherr Enhancements" section?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
Currently we do not support multi row inserts into sqlite databases when `to_sql` is passed `method="multi"` - despite the documentation suggesting that this is supported. Adding support for this is straightforward - it only needs us to implement a single method on the SQLiteTable class and so this PR does just that.
1c18972
to
c228128
Compare
Ping, would be nice to get this in before there is another merge conflict with whatsnew. |
@simongibbons Thanks a lot! (and sorry it took so long) |
I just upgraded from 0.25.3 to 1.0.3 thinking the
Instead, I now do it manually and it's a bit faster than trying to play with the
|
Currently we do not support multi row inserts into sqlite databases
when
to_sql
is passedmethod="multi"
- despite the documentationsuggesting that this is supported.
Adding support for this is straightforward - it only needs us
to implement a single method on the SQLiteTable class and so
this PR does just that.
black pandas
git diff upstream/master -u -- "*.py" | flake8 --diff