Skip to content

ENH: support both SQLAlchemy engines and connections #7877

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
maxgrenderjones opened this issue Jul 30, 2014 · 14 comments · Fixed by #10105
Closed

ENH: support both SQLAlchemy engines and connections #7877

maxgrenderjones opened this issue Jul 30, 2014 · 14 comments · Fixed by #10105
Labels
IO SQL to_sql, read_sql, read_sql_query
Milestone

Comments

@maxgrenderjones
Copy link
Contributor

Feature request for supporting also Connection objects instead of only Engines.

Also reported in #8533, #10104


My use case - I often add a week's worth of data (stored in a pandas dataframe) to an existing table. Before I add the data, I want to be able to delete any existing data for this week from the table, but I want the whole thing to be wrapped in a transaction so I'm not left in a mess halfway through.

Here's what I therefore want to be able to do

# runs a transaction
with engine.begin() as connection:
    r1 = connection.execute(table.delete().where(table.c[WEEK]==week))
    data.to_sql(table.name, connection, if_exists='append')

i.e. my change is that I want to be able to pass a SQLAlchemy Connection object into DataFrame.to_sql instead of an Engine. (The underlying Engine of the Connection object can be accessed as .engine, so we should still be able to pull metadata etc)

@maxgrenderjones maxgrenderjones changed the title [ENH] Support transactions by allowing a connection to be passed in instead of an Engine [ENH] Support transactions by allowing a SQLAlchemy Connection to be passed to .to_sql as well as Engine Jul 30, 2014
@jorisvandenbossche
Copy link
Member

Is it possible to provide this engine yourself by doing the following in the with context: data.to_sql(table.name, connection.engine, if_exists='append')?

Because then it is the question: do we ask the user to do this themselves, or do we provide the convenience (but also clutter in interface) to also accept a connection?

@mmautner
Copy link

Can this issue be closed? I agree w/ jorisvandenbossche that a change to pandas isn't necessary--you can easily access the .engine property from a SQLAlchemy connection object:

from sqlalchemy import create_engine
engine = create_engine('sqlite:///file.db')
connection = engine.connect()
# 'connection.engine' now refers to a readily usable engine object

@jorisvandenbossche
Copy link
Member

You can indeed easily access the engine, but I don't really know how then the atomicity is handled (the reason to use a transaction in this case). Accessing the engine and using that, within the with engine.begin() as con context, does this defeat the purpose of that, or does this keep it atomic?
Anybody who knows this?

@mmautner
Copy link

@jorisvandenbossche
Copy link
Member

yes, but that is for one to_sql call, while here @maxgrenderjones wants to do multiple such calls in one transaction

@artemyk
Copy link
Contributor

artemyk commented Sep 19, 2014

I tested it and running a statement as

conn.engine.execute(sql)

does not run it within the transaction (in other words, the statement would not be rolledback as needed)

One possibility would be to a connection kwarg to to_sql. So the call would be:

data.to_sql(table.name, engine, connection=connection, if_exists='append')

connection could be saved in the appropriate object and, if not None, used in run_transaction instead of creating a new connection. Thoughts?

@maxgrenderjones
Copy link
Contributor Author

That would work. Still feels like it might be easier to pass either a connection or an engine, but I guess the connection option perhaps should be a little bit awkward, as it requires you to understand that you need to do your own transaction wrapping.

@jorisvandenbossche
Copy link
Member

@artemyk Are you sure that providing a connection does run it within a transaction? As connection.begin() instead of engine.begin() does also run a transaction on its own.
Or does it work if the to_sql(..., connection, ...) is nested in another connection begin? (http://docs.sqlalchemy.org/en/latest/core/connections.html#nesting-of-transaction-blocks)

@artemyk
Copy link
Contributor

artemyk commented Sep 19, 2014

@jorisvandenbossche The connection contextmanager (i.e. with engine.begin() as connection:) does run things within a transaction --- that's what we use now in run_transaction, and it passes tests for properly commiting/rollingback.
You are right that connection.begin() also is a way to run a transaction, that might be the right way to do nested transactions. Is that something we'd want to support?
BTW, there already seems to be a way to pass in a cursor --- see cur and is_cursor parameters some functions take. I'm not sure if that is deprecated or not, but seems clearly related to the functionality being requested here. to_sql doesn't support it though, nor does run_transaction check to see if the connection is a cursor.

@tr11
Copy link
Contributor

tr11 commented Mar 30, 2015

@artemyk, what is the advantage of using a sqlalchemy engine instead of a connection? Going through the io/sql.py code, we really only use has_tables and table_names functions of the engine, which is not that much of an issue since we can recover the engine from the connection (using the engine attribute of the connection).

In fact, it seems to make more sense for the scope of pandas to operate at the connection level only. That way the user can have full control over sqlalchemy connection issues such as isolation levels, nested transactions/savepoints, etc (see sqlalchemy engines and connections).

Maybe we could default to use connections and create our own connection in case the user passes an engine? That way this would backwards compatible, but still allow for more advanced usage.

@artemyk
Copy link
Contributor

artemyk commented Mar 30, 2015

@tr11 that sounds like a good idea to me. @jorisvandenbossche thoughts?

@tr11
Copy link
Contributor

tr11 commented Mar 30, 2015

OK, I really needed this right now, so I just hacked a quick version (https://github.com/tr11/pandas/commit/fe6fee394f78a908e74cf3675668a4475db3c616) that passes all the tests with engines or connections. I will add standard tests that use connection parameters and a couple other ones that show the new things that can be done with transaction rollbacks, using temporary tables, etc. If there is any interest in merging this, I can also update the documentation before submitting a pull request.

@jorisvandenbossche jorisvandenbossche changed the title [ENH] Support transactions by allowing a SQLAlchemy Connection to be passed to .to_sql as well as Engine ENH: support both SQLAlchemy engines and connections May 12, 2015
@jorisvandenbossche
Copy link
Member

@tr11 Sorry for the slow reaction here!
Indeed, that looks like a good idea. If you would still want to make a PR for this (with tests and examples/docs), or give some comments on the newly opened PR #10105, that would be really great

graingert pushed a commit to graingert/pandas that referenced this issue May 12, 2015
graingert pushed a commit to graingert/pandas that referenced this issue May 12, 2015
graingert pushed a commit to graingert/pandas that referenced this issue May 12, 2015
@graingert
Copy link
Contributor

I've updated the PR with these changes, and added tests

graingert pushed a commit to graingert/pandas that referenced this issue May 13, 2015
graingert pushed a commit to graingert/pandas that referenced this issue May 22, 2015
graingert pushed a commit to graingert/pandas that referenced this issue May 26, 2015
graingert pushed a commit to graingert/pandas that referenced this issue May 27, 2015
graingert pushed a commit to graingert/pandas that referenced this issue May 27, 2015
graingert pushed a commit to graingert/pandas that referenced this issue May 27, 2015
graingert pushed a commit to graingert/pandas that referenced this issue May 27, 2015
graingert pushed a commit to graingert/pandas that referenced this issue May 29, 2015
graingert pushed a commit to graingert/pandas that referenced this issue May 31, 2015
graingert pushed a commit to graingert/pandas that referenced this issue Jun 3, 2015
graingert pushed a commit to graingert/pandas that referenced this issue Jun 4, 2015
@jorisvandenbossche jorisvandenbossche added this to the 0.17.0 milestone Jul 3, 2015
graingert pushed a commit to graingert/pandas that referenced this issue Jul 3, 2015
update pymysql to 0.6.3 to avoid cursor bugs

Add documentation and tests for SQLAlchemy connectables

explicit reference to connection/engine in docs

Temporary table test

pass compile the connectable
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
IO SQL to_sql, read_sql, read_sql_query
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants