ENH: support both SQLAlchemy engines and connections #7877

maxgrenderjones · 2014-07-30T12:24:35Z

Feature request for supporting also Connection objects instead of only Engines.

Also reported in #8533, #10104

My use case - I often add a week's worth of data (stored in a pandas dataframe) to an existing table. Before I add the data, I want to be able to delete any existing data for this week from the table, but I want the whole thing to be wrapped in a transaction so I'm not left in a mess halfway through.

Here's what I therefore want to be able to do

# runs a transaction
with engine.begin() as connection:
    r1 = connection.execute(table.delete().where(table.c[WEEK]==week))
    data.to_sql(table.name, connection, if_exists='append')

i.e. my change is that I want to be able to pass a SQLAlchemy Connection object into DataFrame.to_sql instead of an Engine. (The underlying Engine of the Connection object can be accessed as .engine, so we should still be able to pull metadata etc)

The text was updated successfully, but these errors were encountered:

jorisvandenbossche · 2014-07-30T19:27:38Z

Is it possible to provide this engine yourself by doing the following in the with context: data.to_sql(table.name, connection.engine, if_exists='append')?

Because then it is the question: do we ask the user to do this themselves, or do we provide the convenience (but also clutter in interface) to also accept a connection?

mmautner · 2014-09-10T15:39:16Z

Can this issue be closed? I agree w/ jorisvandenbossche that a change to pandas isn't necessary--you can easily access the .engine property from a SQLAlchemy connection object:

from sqlalchemy import create_engine
engine = create_engine('sqlite:///file.db')
connection = engine.connect()
# 'connection.engine' now refers to a readily usable engine object

jorisvandenbossche · 2014-09-10T22:12:22Z

You can indeed easily access the engine, but I don't really know how then the atomicity is handled (the reason to use a transaction in this case). Accessing the engine and using that, within the with engine.begin() as con context, does this defeat the purpose of that, or does this keep it atomic?
Anybody who knows this?

mmautner · 2014-09-11T13:10:20Z

It's already done as a transaction:

https://github.com/pydata/pandas/blob/master/pandas/io/sql.py#L633-L644

jorisvandenbossche · 2014-09-11T13:14:27Z

yes, but that is for one to_sql call, while here @maxgrenderjones wants to do multiple such calls in one transaction

artemyk · 2014-09-19T00:22:00Z

I tested it and running a statement as

conn.engine.execute(sql)

does not run it within the transaction (in other words, the statement would not be rolledback as needed)

One possibility would be to a connection kwarg to to_sql. So the call would be:

data.to_sql(table.name, engine, connection=connection, if_exists='append')

connection could be saved in the appropriate object and, if not None, used in run_transaction instead of creating a new connection. Thoughts?

maxgrenderjones · 2014-09-19T08:34:46Z

That would work. Still feels like it might be easier to pass either a connection or an engine, but I guess the connection option perhaps should be a little bit awkward, as it requires you to understand that you need to do your own transaction wrapping.

jorisvandenbossche · 2014-09-19T10:55:23Z

@artemyk Are you sure that providing a connection does run it within a transaction? As connection.begin() instead of engine.begin() does also run a transaction on its own.
Or does it work if the to_sql(..., connection, ...) is nested in another connection begin? (http://docs.sqlalchemy.org/en/latest/core/connections.html#nesting-of-transaction-blocks)

artemyk · 2014-09-19T20:25:29Z

@jorisvandenbossche The connection contextmanager (i.e. with engine.begin() as connection:) does run things within a transaction --- that's what we use now in run_transaction, and it passes tests for properly commiting/rollingback.
You are right that connection.begin() also is a way to run a transaction, that might be the right way to do nested transactions. Is that something we'd want to support?
BTW, there already seems to be a way to pass in a cursor --- see cur and is_cursor parameters some functions take. I'm not sure if that is deprecated or not, but seems clearly related to the functionality being requested here. to_sql doesn't support it though, nor does run_transaction check to see if the connection is a cursor.

tr11 · 2015-03-30T17:45:02Z

@artemyk, what is the advantage of using a sqlalchemy engine instead of a connection? Going through the io/sql.py code, we really only use has_tables and table_names functions of the engine, which is not that much of an issue since we can recover the engine from the connection (using the engine attribute of the connection).

In fact, it seems to make more sense for the scope of pandas to operate at the connection level only. That way the user can have full control over sqlalchemy connection issues such as isolation levels, nested transactions/savepoints, etc (see sqlalchemy engines and connections).

Maybe we could default to use connections and create our own connection in case the user passes an engine? That way this would backwards compatible, but still allow for more advanced usage.

artemyk · 2015-03-30T18:11:30Z

@tr11 that sounds like a good idea to me. @jorisvandenbossche thoughts?

tr11 · 2015-03-30T19:06:35Z

OK, I really needed this right now, so I just hacked a quick version (https://github.com/tr11/pandas/commit/fe6fee394f78a908e74cf3675668a4475db3c616) that passes all the tests with engines or connections. I will add standard tests that use connection parameters and a couple other ones that show the new things that can be done with transaction rollbacks, using temporary tables, etc. If there is any interest in merging this, I can also update the documentation before submitting a pull request.

jorisvandenbossche · 2015-05-12T08:50:16Z

@tr11 Sorry for the slow reaction here!
Indeed, that looks like a good idea. If you would still want to make a PR for this (with tests and examples/docs), or give some comments on the newly opened PR #10105, that would be really great

graingert · 2015-05-12T12:44:15Z

I've updated the PR with these changes, and added tests

update pymysql to 0.6.3 to avoid cursor bugs Add documentation and tests for SQLAlchemy connectables explicit reference to connection/engine in docs Temporary table test pass compile the connectable

maxgrenderjones changed the title ~~[ENH] Support transactions by allowing a connection to be passed in instead of an Engine~~ [ENH] Support transactions by allowing a SQLAlchemy Connection to be passed to .to_sql as well as Engine Jul 30, 2014

jorisvandenbossche added the SQL label Jul 30, 2014

artemyk mentioned this issue May 12, 2015

support both sqlalchemy engines and connections #10105

Merged

This was referenced May 12, 2015

support sqlalchemy connections #10104

Closed

Accept SQLAlchemy Connection as con param in pd.read_sql_query() #8533

Closed

jorisvandenbossche changed the title ~~[ENH] Support transactions by allowing a SQLAlchemy Connection to be passed to .to_sql as well as Engine~~ ENH: support both SQLAlchemy engines and connections May 12, 2015

graingert pushed a commit to graingert/pandas that referenced this issue May 12, 2015

support both sqlalchemy engines and connections Fixes pandas-dev#7877

d25a262

graingert pushed a commit to graingert/pandas that referenced this issue May 12, 2015

support both sqlalchemy engines and connections Fixes pandas-dev#7877

e85087f

graingert pushed a commit to graingert/pandas that referenced this issue May 12, 2015

support both sqlalchemy engines and connections Fixes pandas-dev#7877

ef122d6

graingert pushed a commit to graingert/pandas that referenced this issue May 13, 2015

support both sqlalchemy engines and connections Fixes pandas-dev#7877

5c7d352

graingert pushed a commit to graingert/pandas that referenced this issue May 22, 2015

support both sqlalchemy engines and connections Fixes pandas-dev#7877

6468871

graingert pushed a commit to graingert/pandas that referenced this issue May 26, 2015

support both sqlalchemy engines and connections Fixes pandas-dev#7877

75c3676

graingert pushed a commit to graingert/pandas that referenced this issue May 27, 2015

support both sqlalchemy engines and connections Fixes pandas-dev#7877

1281690

graingert pushed a commit to graingert/pandas that referenced this issue May 27, 2015

support both sqlalchemy engines and connections Fixes pandas-dev#7877

5fd3d9a

graingert pushed a commit to graingert/pandas that referenced this issue May 27, 2015

support both sqlalchemy engines and connections Fixes pandas-dev#7877

e164845

graingert pushed a commit to graingert/pandas that referenced this issue May 27, 2015

support both sqlalchemy engines and connections Fixes pandas-dev#7877

71fd12e

graingert pushed a commit to graingert/pandas that referenced this issue May 29, 2015

support both sqlalchemy engines and connections Fixes pandas-dev#7877

e3c37e2

graingert pushed a commit to graingert/pandas that referenced this issue May 31, 2015

support both sqlalchemy engines and connections Fixes pandas-dev#7877

0a8e2a5

graingert pushed a commit to graingert/pandas that referenced this issue Jun 3, 2015

support both sqlalchemy engines and connections Fixes pandas-dev#7877

2db0179

graingert pushed a commit to graingert/pandas that referenced this issue Jun 4, 2015

support both sqlalchemy engines and connections Fixes pandas-dev#7877

30d46fc

jorisvandenbossche added this to the 0.17.0 milestone Jul 3, 2015

jorisvandenbossche closed this as completed in #10105 Jul 3, 2015

cdcadman mentioned this issue Dec 3, 2022

BUG: Allow read_sql to work with chunksize. #49967

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: support both SQLAlchemy engines and connections #7877

ENH: support both SQLAlchemy engines and connections #7877

maxgrenderjones commented Jul 30, 2014

jorisvandenbossche commented Jul 30, 2014

mmautner commented Sep 10, 2014

jorisvandenbossche commented Sep 10, 2014

mmautner commented Sep 11, 2014

jorisvandenbossche commented Sep 11, 2014

artemyk commented Sep 19, 2014

maxgrenderjones commented Sep 19, 2014

jorisvandenbossche commented Sep 19, 2014

artemyk commented Sep 19, 2014

tr11 commented Mar 30, 2015

artemyk commented Mar 30, 2015

tr11 commented Mar 30, 2015

jorisvandenbossche commented May 12, 2015

graingert commented May 12, 2015

ENH: support both SQLAlchemy engines and connections #7877

ENH: support both SQLAlchemy engines and connections #7877

Comments

maxgrenderjones commented Jul 30, 2014

jorisvandenbossche commented Jul 30, 2014

mmautner commented Sep 10, 2014

jorisvandenbossche commented Sep 10, 2014

mmautner commented Sep 11, 2014

jorisvandenbossche commented Sep 11, 2014

artemyk commented Sep 19, 2014

maxgrenderjones commented Sep 19, 2014

jorisvandenbossche commented Sep 19, 2014

artemyk commented Sep 19, 2014

tr11 commented Mar 30, 2015

artemyk commented Mar 30, 2015

tr11 commented Mar 30, 2015

jorisvandenbossche commented May 12, 2015

graingert commented May 12, 2015