ENH pandas-dev#4163 Use SQLAlchemy for DB abstraction

danielballan · jreback · commit f47f9fe1cc4c · 2014-02-06T15:23:39.000-05:00
TST Import sqlalchemy on Travis. DOC add docstrings to read sql ENH read_sql connects via Connection, Engine, file path, or :memory: string CLN Separate legacy code into new file, and fallback so that all old tests pass. TST to use sqlachemy syntax in tests CLN sql into classes, legacy passes FIX few engine vs con calls CLN pep8 cleanup add postgres support for pandas.io.sql.get_schema WIP: cleaup of sql io module - imported correct SQLALCHEMY type, delete redundant PandasSQLWithCon TODO: renamed _engine_read_table, need to think of a better name. TODO: clean up get_conneciton function ENH: cleanup of SQL io TODO: check that legacy mode works TODO: run tests correctly enabled coerce_float option Cleanup and bug-fixing mainly on legacy mode sql. IMPORTANT - changed legacy to require connection rather than cursor. This is still not yet finalized. TODO: tests and doc Added Test coverage for basic functionality using in-memory SQLite database Simplified API by automatically distinguishing between engine and connection. Added warnings ENH pandas-dev#4163 Added tests and documentation Initial draft of doc updates minor doc updates Added tests and reduced code repetition. Updated Docs. Added test coverage for legacy names Documentation updates, more tests Added depreciation warnings for legacy names. Updated docs and test doc build ENH pandas-dev#4163 - finalized tests and docs, ready for wider use… TST added sqlalchemy to TravisCI build dep for py 2.7 and 3.3 TST Import sqlalchemy on Travis. DOC add docstrings to read sql ENH read_sql connects via Connection, Engine, file path, or :memory: string CLN Separate legacy code into new file, and fallback so that all old tests pass. ENH pandas-dev#4163 added version added coment ENH pandas-dev#4163 added depreciation warning for tquery and uquery ENH pandas-dev#4163 Documentation and tests ENH pandas-dev#4163 Added more robust type coertion, datetime parsing, and parse date options. Updated optional dependancies Added columns optional arg to read_table, removed failing legacy tests. Added columns to doc ENH pandas-dev#4163 Fixed class renaming, expanded docs ENH pandas-dev#4163 Fixed tests in legacy mode ENH pandas-dev#4163 Use SQLAlchemy for DB abstraction TST Import sqlalchemy on Travis. DOC add docstrings to read sql ENH read_sql connects via Connection, Engine, file path, or :memory: string CLN Separate legacy code into new file, and fallback so that all old tests pass. TST to use sqlachemy syntax in tests CLN sql into classes, legacy passes FIX few engine vs con calls CLN pep8 cleanup add postgres support for pandas.io.sql.get_schema WIP: cleaup of sql io module - imported correct SQLALCHEMY type, delete redundant PandasSQLWithCon TODO: renamed _engine_read_table, need to think of a better name. TODO: clean up get_conneciton function ENH: cleanup of SQL io TODO: check that legacy mode works TODO: run tests correctly enabled coerce_float option Cleanup and bug-fixing mainly on legacy mode sql. IMPORTANT - changed legacy to require connection rather than cursor. This is still not yet finalized. TODO: tests and doc Added Test coverage for basic functionality using in-memory SQLite database Simplified API by automatically distinguishing between engine and connection. Added warnings ENH pandas-dev#4163 Added tests and documentation Initial draft of doc updates minor doc updates Added tests and reduced code repetition. Updated Docs. Added test coverage for legacy names Documentation updates, more tests Added depreciation warnings for legacy names. Updated docs and test doc build ENH pandas-dev#4163 - finalized tests and docs, ready for wider use… TST added sqlalchemy to TravisCI build dep for py 2.7 and 3.3 TST Import sqlalchemy on Travis. DOC add docstrings to read sql ENH read_sql connects via Connection, Engine, file path, or :memory: string CLN Separate legacy code into new file, and fallback so that all old tests pass. ENH pandas-dev#4163 added version added coment ENH pandas-dev#4163 added depreciation warning for tquery and uquery ENH pandas-dev#4163 Documentation and tests ENH pandas-dev#4163 Added more robust type coertion, datetime parsing, and parse date options. Updated optional dependancies Added columns optional arg to read_table, removed failing legacy tests. Added columns to doc ENH pandas-dev#4163 Fixed class renaming, expanded docs ENH pandas-dev#4163 Fixed tests in legacy mode ENH pandas-dev#4163 Tweaks to docs, avoid mutable default args, mysql tests ENH pandas-dev#4163 Introduce DataFrame Index support. Refactor to introduce PandasSQLTable for cleaner OOP design ENH pandas-dev#4163 Fix bug in index + parse date interaction, added test case for problem ENH pandas-dev#4163 Fixed missing basestring import for py3.3 compat ENH pandas-dev#4163 Fixed missing string_types import for py3.3 compat
diff --git a/README.md b/README.md
@@ -106,6 +106,7 @@ pip install pandas
 - [Cython](http://www.cython.org): Only necessary to build development version. Version 0.17.1 or higher.
 - [SciPy](http://www.scipy.org): miscellaneous statistical functions
 - [PyTables](http://www.pytables.org): necessary for HDF5-based storage
+- [SQLAlchemy](http://www.sqlalchemy.org): for SQL database support. Version 0.8.1 or higher recommended.
 - [matplotlib](http://matplotlib.sourceforge.net/): for plotting
 - [statsmodels](http://statsmodels.sourceforge.net/)
    - Needed for parts of `pandas.stats`
diff --git a/ci/requirements-2.6.txt b/ci/requirements-2.6.txt
@@ -6,3 +6,4 @@ http://www.crummy.com/software/BeautifulSoup/bs4/download/4.2/beautifulsoup4-4.2
 html5lib==1.0b2
 bigquery==2.0.17
 numexpr==1.4.2
+sqlalchemy==0.8.1
diff --git a/ci/requirements-2.7.txt b/ci/requirements-2.7.txt
@@ -19,3 +19,4 @@ scipy==0.10.0
 beautifulsoup4==4.2.1
 statsmodels==0.5.0
 bigquery==2.0.17
+sqlalchemy==0.8.1
diff --git a/ci/requirements-2.7_LOCALE.txt b/ci/requirements-2.7_LOCALE.txt
@@ -15,3 +15,4 @@ scipy==0.10.0
 beautifulsoup4==4.2.1
 statsmodels==0.5.0
 bigquery==2.0.17
+sqlalchemy==0.8.1
diff --git a/ci/requirements-3.3.txt b/ci/requirements-3.3.txt
@@ -14,3 +14,4 @@ lxml==3.2.1
 scipy==0.12.0
 beautifulsoup4==4.2.1
 statsmodels==0.4.3
+sqlalchemy==0.9.1
diff --git a/doc/source/install.rst b/doc/source/install.rst
@@ -95,6 +95,7 @@ Optional Dependencies
     version. Version 0.17.1 or higher.
   * `SciPy <http://www.scipy.org>`__: miscellaneous statistical functions
   * `PyTables <http://www.pytables.org>`__: necessary for HDF5-based storage
+  * `SQLAlchemy <http://www.sqlalchemy.org>`__: for SQL database support. Version 0.8.1 or higher recommended.
   * `matplotlib <http://matplotlib.sourceforge.net/>`__: for plotting
   * `statsmodels <http://statsmodels.sourceforge.net/>`__
      * Needed for parts of :mod:`pandas.stats`
diff --git a/doc/source/io.rst b/doc/source/io.rst
@@ -1823,7 +1823,7 @@ class. The following two command are equivalent:
     read_excel('path_to_file.xls', 'Sheet1', index_col=None, na_values=['NA'])
 
 The class based approach can be used to read multiple sheets or to introspect
-the sheet names using the ``sheet_names`` attribute. 
+the sheet names using the ``sheet_names`` attribute.
 
 .. note::
 
@@ -3068,13 +3068,48 @@ SQL Queries
 -----------
 
 The :mod:`pandas.io.sql` module provides a collection of query wrappers to both
-facilitate data retrieval and to reduce dependency on DB-specific API. These
-wrappers only support the Python database adapters which respect the `Python
-DB-API <http://www.python.org/dev/peps/pep-0249/>`__. See some
-:ref:`cookbook examples <cookbook.sql>` for some advanced strategies
+facilitate data retrieval and to reduce dependency on DB-specific API. Database abstraction
+is provided by SQLAlchemy if installed, in addition you will need a driver library for
+your database.
 
-For example, suppose you want to query some data with different types from a
-table such as:
+.. versionadded:: 0.14.0
+
+
+If SQLAlchemy is not installed a legacy fallback is provided for sqlite and mysql.
+These legacy modes require Python database adapters which respect the `Python
+DB-API <http://www.python.org/dev/peps/pep-0249/>`__.
+
+See also some :ref:`cookbook examples <cookbook.sql>` for some advanced strategies.
+
+The key functions are:
+:func:`~pandas.io.sql.to_sql`
+:func:`~pandas.io.sql.read_sql`
+:func:`~pandas.io.sql.read_table`
+
+
+In the following example, we use the `SQlite <http://www.sqlite.org/>`__ SQL database
+engine. You can use a temporary SQLite database where data are stored in
+"memory".
+
+To connect with SQLAlchemy you use the :func:`create_engine` function to create an engine
+object from database URI. You only need to create the engine once per database you are
+connecting to.
+
+For more information on :func:`create_engine` and the URI formatting, see the examples
+below and the SQLAlchemy `documentation <http://docs.sqlalchemy.org/en/rel_0_9/core/engines.html>`__
+
+.. code-block:: python
+
+   from sqlalchemy import create_engine
+   from pandas.io import sql
+   # Create your connection.
+   engine = create_engine('sqlite:///:memory:')
+
+Writing DataFrames
+~~~~~~~~~~~~~~~~~~
+
+Assuming the following data is in a DataFrame ``data``, we can insert it into
+the database using :func:`~pandas.io.sql.to_sql`.
 
 
 +-----+------------+-------+-------+-------+
@@ -3088,81 +3123,141 @@ table such as:
 +-----+------------+-------+-------+-------+
 
 
-Functions from :mod:`pandas.io.sql` can extract some data into a DataFrame. In
-the following example, we use the `SQlite <http://www.sqlite.org/>`__ SQL database
-engine. You can use a temporary SQLite database where data are stored in
-"memory". Just do:
-
-.. code-block:: python
+.. ipython:: python
+   :suppress:
 
-   import sqlite3
+   from sqlalchemy import create_engine
    from pandas.io import sql
-   # Create your connection.
-   cnx = sqlite3.connect(':memory:')
+   engine = create_engine('sqlite:///:memory:')
 
 .. ipython:: python
    :suppress:
 
-   import sqlite3
-   from pandas.io import sql
-   cnx = sqlite3.connect(':memory:')
+   c = ['id', 'Date', 'Col_1', 'Col_2', 'Col_3']
+   d = [(26, datetime.datetime(2010,10,18), 'X', 27.5, True),
+   (42, datetime.datetime(2010,10,19), 'Y', -12.5, False),
+   (63, datetime.datetime(2010,10,20), 'Z', 5.73, True)]
+
+   data  = DataFrame(d, columns=c)
 
 .. ipython:: python
-   :suppress:
 
-   cu = cnx.cursor()
-   # Create a table named 'data'.
-   cu.execute("""CREATE TABLE data(id integer,
-                                   date date,
-                                   Col_1 string,
-                                   Col_2 float,
-                                   Col_3 bool);""")
-   cu.executemany('INSERT INTO data VALUES (?,?,?,?,?)',
-                  [(26, datetime.datetime(2010,10,18), 'X', 27.5, True),
-                   (42, datetime.datetime(2010,10,19), 'Y', -12.5, False),
-                   (63, datetime.datetime(2010,10,20), 'Z', 5.73, True)])
+Reading Tables
+~~~~~~~~~~~~~~
+
+:func:`~pandas.io.sql.read_table` will read a databse table given the
+table name and optionally a subset of columns to read.
 
+.. note::
 
-Let ``data`` be the name of your SQL table. With a query and your database
-connection, just use the :func:`~pandas.io.sql.read_sql` function to get the
-query results into a DataFrame:
+    In order to use :func:`~pandas.io.sql.read_table`, you **must** have the
+    SQLAlchemy optional dependency installed.
 
 .. ipython:: python
 
-   sql.read_sql("SELECT * FROM data;", cnx)
+   sql.read_table('data', engine)
 
-You can also specify the name of the column as the DataFrame index:
+You can also specify the name of the column as the DataFrame index,
+and specify a subset of columns to be read.
 
 .. ipython:: python
 
-   sql.read_sql("SELECT * FROM data;", cnx, index_col='id')
-   sql.read_sql("SELECT * FROM data;", cnx, index_col='date')
+   sql.read_table('data', engine, index_col='id')
+   sql.read_table('data', engine, columns=['Col_1', 'Col_2'])
 
-Of course, you can specify a more "complex" query.
+And you can explicitly force columns to be parsed as dates:
+
+.. ipython:: python
+
+   sql.read_table('data', engine, parse_dates=['Date'])
+
+If needed you can explicitly specifiy a format string, or a dict of arguments
+to pass to :func:`pandas.tseries.tools.to_datetime`.
+
+.. code-block:: python
+
+   sql.read_table('data', engine, parse_dates={'Date': '%Y-%m-%d'})
+   sql.read_table('data', engine, parse_dates={'Date': {'format': '%Y-%m-%d %H:%M:%S'}})
+
+
+You can check if a table exists using :func:`~pandas.io.sql.has_table`
+
+In addition, the class :class:`~pandas.io.sql.PandasSQLWithEngine` can be
+instantiated directly for more manual control over the SQL interaction.
+
+Querying
+~~~~~~~~
+
+You can query using raw SQL in the :func:`~pandas.io.sql.read_sql` function.
+In this case you must use the SQL variant appropriate for your database.
+When using SQLAlchemy, you can also pass SQLAlchemy Expression language constructs,
+which are database-agnostic.
 
 .. ipython:: python
 
-   sql.read_sql("SELECT id, Col_1, Col_2 FROM data WHERE id = 42;", cnx)
+  sql.read_sql('SELECT * FROM data', engine)
+
+Of course, you can specify a more "complex" query.
 
 .. ipython:: python
-   :suppress:
 
-   cu.close()
-   cnx.close()
+   sql.read_frame("SELECT id, Col_1, Col_2 FROM data WHERE id = 42;", engine)
 
 
-There are a few other available functions:
+You can also run a plain query without creating a dataframe with
+:func:`~pandas.io.sql.execute`. This is useful for queries that don't return values,
+such as INSERT. This is functionally equivalent to calling ``execute`` on the
+SQLAlchemy engine or db connection object. Again, ou must use the SQL syntax
+variant appropriate for your database.
 
-  - ``tquery`` returns a list of tuples corresponding to each row.
-  - ``uquery`` does the same thing as tquery, but instead of returning results
-    it returns the number of related rows.
-  - ``write_frame`` writes records stored in a DataFrame into the SQL table.
-  - ``has_table`` checks if a given SQLite table exists.
+.. code-block:: python
 
-.. note::
+   sql.execute('SELECT * FROM table_name', engine)
+
+   sql.execute('INSERT INTO table_name VALUES(?, ?, ?)', engine, params=[('id', 1, 12.2, True)])
+
+Engine connection examples
+~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+.. code-block:: python
+
+  from sqlalchemy import create_engine
+
+  engine = create_engine('postgresql://scott:tiger@localhost:5432/mydatabase')
+
+  engine = create_engine('mysql+mysqldb://scott:tiger@localhost/foo')
+
+  engine = create_engine('oracle://scott:tiger@127.0.0.1:1521/sidname')
+
+  engine = create_engine('mssql+pyodbc://mydsn')
+
+  # sqlite://<nohostname>/<path>
+  # where <path> is relative:
+  engine = create_engine('sqlite:///foo.db')
+
+  # or absolute, starting with a slash:
+  engine = create_engine('sqlite:////absolute/path/to/foo.db')
+
+
+Legacy
+~~~~~~
+To use the sqlite support without SQLAlchemy, you can create connections like so:
+
+.. code-block:: python
+
+   import sqlite3
+   from pandas.io import sql
+   cnx = sqlite3.connect(':memory:')
+
+And then issue the following queries, remembering to also specify the flavor of SQL
+you are using.
+
+.. code-block:: python
+
+   sql.to_sql(data, 'data', cnx,  flavor='sqlite')
+
+   sql.read_sql("SELECT * FROM data", cnx, flavor='sqlite')
 
-   For now, writing your DataFrame into a database works only with
-   **SQLite**. Moreover, the **index** will currently be **dropped**.
 
 .. _io.bigquery:
 
diff --git a/pandas/io/sql.py b/pandas/io/sql.py
diff --git a/pandas/io/tests/data/iris.csv b/pandas/io/tests/data/iris.csv
diff --git a/pandas/io/tests/test_sql.py b/pandas/io/tests/test_sql.py