-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
ENH: to_sql() add parameter "method" to control insertions method (#8… #21401
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 10 commits
a77cdfd
21e8c04
1e5d1cc
085313c
236e2c0
f4ffbfc
b49792b
59cbdf7
7bdf6f3
44f321a
d5ccabf
643e9bf
6fc6a26
87730f3
f36710b
19f9dfa
c0bf457
19ce379
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -4989,6 +4989,54 @@ with respect to the timezone. | |
timezone aware or naive. When reading ``TIMESTAMP WITH TIME ZONE`` types, pandas | ||
will convert the data to UTC. | ||
|
||
.. _io.sql.method: | ||
|
||
Insertion Method | ||
++++++++++++++++ | ||
|
||
.. versionadded:: 0.24.0 | ||
|
||
The parameter ``method`` controls the SQL insertion clause used. | ||
Possible values are: | ||
|
||
- ``'default'``: Uses standard SQL ``INSERT`` clause (one per row). | ||
- ``'multi'``: Pass multiple values in a single ``INSERT`` clause. | ||
It uses a *special* SQL syntax not supported by all backends. | ||
This usually provides better performance for analytic databases | ||
like *Presto* and *Redshift*, but has worse performance for | ||
traditional SQL backend if the table contains many columns. | ||
For more information check the SQLAlchemy `documention | ||
<http://docs.sqlalchemy.org/en/latest/core/dml.html#sqlalchemy.sql.expression.Insert.values.params.*args>`__. | ||
- callable with signature ``(pd_table, conn, keys, data_iter)``: | ||
This can be used to implement a more performant insertion method based on | ||
specific backend dialect features. | ||
|
||
Example of a callable using PostgreSQL `COPY clause | ||
<https://www.postgresql.org/docs/current/static/sql-copy.html>`__:: | ||
|
||
# Alternative to_sql() *method* for DBs that support COPY FROM | ||
import csv | ||
from io import StringIO | ||
|
||
def psql_insert_copy(table, conn, keys, data_iter): | ||
# gets a DBAPI connection that can provide a cursor | ||
dbapi_conn = conn.connection | ||
with dbapi_conn.cursor() as cur: | ||
s_buf = StringIO() | ||
writer = csv.writer(s_buf) | ||
writer.writerows(data_iter) | ||
s_buf.seek(0) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is there a reason you are not using Further, I think it would be good to actually run the code of this example, to make sure it works. We could either add it as a test case, or either actually run it in the documentation build (by applying it on a small example dataframe) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. My understanding is that There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I agree this needs testing. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Are we fixing this before merging? I can push a commit if we want. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @TomAugspurger fix as in use There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Started out as that, but this is kind of a can of worms to actually run this example. We would need to actually create a Postgres database. I'll see if something similar exists for sqlite. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Actually nevermind, this is tested below and we don't actually have a DataFrame here we have a There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. So +1 to merging as is. |
||
|
||
columns = ', '.join('"{}"'.format(k) for k in keys) | ||
if table.schema: | ||
table_name = '{}.{}'.format(table.schema, table.name) | ||
else: | ||
table_name = table.name | ||
|
||
sql = 'COPY {} ({}) FROM STDIN WITH CSV'.format( | ||
table_name, columns) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I realise this is just an example, but isn't this piece vulnerable to SQL injection? the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @dahlbaek no expert on this, but if you could provide the code that you would replace it with to make it more robust, that would certainly be welcome! |
||
cur.copy_expert(sql=sql, file=s_buf) | ||
|
||
Reading Tables | ||
'''''''''''''' | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The "one per row", is that correct?
Because as far as I understand, the default behaviour will be to do a single insert statement with multiple rows (a "executemany"), where the number of rows is determined by the chunksize (and which by default is all rows)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe that is why everybody is complaining that it is so slow 😁