Skip to content

Commit f47f9fe

Browse files
danielballanjreback
authored andcommitted
ENH pandas-dev#4163 Use SQLAlchemy for DB abstraction
TST Import sqlalchemy on Travis. DOC add docstrings to read sql ENH read_sql connects via Connection, Engine, file path, or :memory: string CLN Separate legacy code into new file, and fallback so that all old tests pass. TST to use sqlachemy syntax in tests CLN sql into classes, legacy passes FIX few engine vs con calls CLN pep8 cleanup add postgres support for pandas.io.sql.get_schema WIP: cleaup of sql io module - imported correct SQLALCHEMY type, delete redundant PandasSQLWithCon TODO: renamed _engine_read_table, need to think of a better name. TODO: clean up get_conneciton function ENH: cleanup of SQL io TODO: check that legacy mode works TODO: run tests correctly enabled coerce_float option Cleanup and bug-fixing mainly on legacy mode sql. IMPORTANT - changed legacy to require connection rather than cursor. This is still not yet finalized. TODO: tests and doc Added Test coverage for basic functionality using in-memory SQLite database Simplified API by automatically distinguishing between engine and connection. Added warnings ENH pandas-dev#4163 Added tests and documentation Initial draft of doc updates minor doc updates Added tests and reduced code repetition. Updated Docs. Added test coverage for legacy names Documentation updates, more tests Added depreciation warnings for legacy names. Updated docs and test doc build ENH pandas-dev#4163 - finalized tests and docs, ready for wider use… TST added sqlalchemy to TravisCI build dep for py 2.7 and 3.3 TST Import sqlalchemy on Travis. DOC add docstrings to read sql ENH read_sql connects via Connection, Engine, file path, or :memory: string CLN Separate legacy code into new file, and fallback so that all old tests pass. ENH pandas-dev#4163 added version added coment ENH pandas-dev#4163 added depreciation warning for tquery and uquery ENH pandas-dev#4163 Documentation and tests ENH pandas-dev#4163 Added more robust type coertion, datetime parsing, and parse date options. Updated optional dependancies Added columns optional arg to read_table, removed failing legacy tests. Added columns to doc ENH pandas-dev#4163 Fixed class renaming, expanded docs ENH pandas-dev#4163 Fixed tests in legacy mode ENH pandas-dev#4163 Use SQLAlchemy for DB abstraction TST Import sqlalchemy on Travis. DOC add docstrings to read sql ENH read_sql connects via Connection, Engine, file path, or :memory: string CLN Separate legacy code into new file, and fallback so that all old tests pass. TST to use sqlachemy syntax in tests CLN sql into classes, legacy passes FIX few engine vs con calls CLN pep8 cleanup add postgres support for pandas.io.sql.get_schema WIP: cleaup of sql io module - imported correct SQLALCHEMY type, delete redundant PandasSQLWithCon TODO: renamed _engine_read_table, need to think of a better name. TODO: clean up get_conneciton function ENH: cleanup of SQL io TODO: check that legacy mode works TODO: run tests correctly enabled coerce_float option Cleanup and bug-fixing mainly on legacy mode sql. IMPORTANT - changed legacy to require connection rather than cursor. This is still not yet finalized. TODO: tests and doc Added Test coverage for basic functionality using in-memory SQLite database Simplified API by automatically distinguishing between engine and connection. Added warnings ENH pandas-dev#4163 Added tests and documentation Initial draft of doc updates minor doc updates Added tests and reduced code repetition. Updated Docs. Added test coverage for legacy names Documentation updates, more tests Added depreciation warnings for legacy names. Updated docs and test doc build ENH pandas-dev#4163 - finalized tests and docs, ready for wider use… TST added sqlalchemy to TravisCI build dep for py 2.7 and 3.3 TST Import sqlalchemy on Travis. DOC add docstrings to read sql ENH read_sql connects via Connection, Engine, file path, or :memory: string CLN Separate legacy code into new file, and fallback so that all old tests pass. ENH pandas-dev#4163 added version added coment ENH pandas-dev#4163 added depreciation warning for tquery and uquery ENH pandas-dev#4163 Documentation and tests ENH pandas-dev#4163 Added more robust type coertion, datetime parsing, and parse date options. Updated optional dependancies Added columns optional arg to read_table, removed failing legacy tests. Added columns to doc ENH pandas-dev#4163 Fixed class renaming, expanded docs ENH pandas-dev#4163 Fixed tests in legacy mode ENH pandas-dev#4163 Tweaks to docs, avoid mutable default args, mysql tests ENH pandas-dev#4163 Introduce DataFrame Index support. Refactor to introduce PandasSQLTable for cleaner OOP design ENH pandas-dev#4163 Fix bug in index + parse date interaction, added test case for problem ENH pandas-dev#4163 Fixed missing basestring import for py3.3 compat ENH pandas-dev#4163 Fixed missing string_types import for py3.3 compat
1 parent cc6ee40 commit f47f9fe

10 files changed

+1800
-881
lines changed

README.md

+1
Original file line numberDiff line numberDiff line change
@@ -106,6 +106,7 @@ pip install pandas
106106
- [Cython](http://www.cython.org): Only necessary to build development version. Version 0.17.1 or higher.
107107
- [SciPy](http://www.scipy.org): miscellaneous statistical functions
108108
- [PyTables](http://www.pytables.org): necessary for HDF5-based storage
109+
- [SQLAlchemy](http://www.sqlalchemy.org): for SQL database support. Version 0.8.1 or higher recommended.
109110
- [matplotlib](http://matplotlib.sourceforge.net/): for plotting
110111
- [statsmodels](http://statsmodels.sourceforge.net/)
111112
- Needed for parts of `pandas.stats`

ci/requirements-2.6.txt

+1
Original file line numberDiff line numberDiff line change
@@ -6,3 +6,4 @@ http://www.crummy.com/software/BeautifulSoup/bs4/download/4.2/beautifulsoup4-4.2
66
html5lib==1.0b2
77
bigquery==2.0.17
88
numexpr==1.4.2
9+
sqlalchemy==0.8.1

ci/requirements-2.7.txt

+1
Original file line numberDiff line numberDiff line change
@@ -19,3 +19,4 @@ scipy==0.10.0
1919
beautifulsoup4==4.2.1
2020
statsmodels==0.5.0
2121
bigquery==2.0.17
22+
sqlalchemy==0.8.1

ci/requirements-2.7_LOCALE.txt

+1
Original file line numberDiff line numberDiff line change
@@ -15,3 +15,4 @@ scipy==0.10.0
1515
beautifulsoup4==4.2.1
1616
statsmodels==0.5.0
1717
bigquery==2.0.17
18+
sqlalchemy==0.8.1

ci/requirements-3.3.txt

+1
Original file line numberDiff line numberDiff line change
@@ -14,3 +14,4 @@ lxml==3.2.1
1414
scipy==0.12.0
1515
beautifulsoup4==4.2.1
1616
statsmodels==0.4.3
17+
sqlalchemy==0.9.1

doc/source/install.rst

+1
Original file line numberDiff line numberDiff line change
@@ -95,6 +95,7 @@ Optional Dependencies
9595
version. Version 0.17.1 or higher.
9696
* `SciPy <http://www.scipy.org>`__: miscellaneous statistical functions
9797
* `PyTables <http://www.pytables.org>`__: necessary for HDF5-based storage
98+
* `SQLAlchemy <http://www.sqlalchemy.org>`__: for SQL database support. Version 0.8.1 or higher recommended.
9899
* `matplotlib <http://matplotlib.sourceforge.net/>`__: for plotting
99100
* `statsmodels <http://statsmodels.sourceforge.net/>`__
100101
* Needed for parts of :mod:`pandas.stats`

doc/source/io.rst

+147-52
Original file line numberDiff line numberDiff line change
@@ -1823,7 +1823,7 @@ class. The following two command are equivalent:
18231823
read_excel('path_to_file.xls', 'Sheet1', index_col=None, na_values=['NA'])
18241824
18251825
The class based approach can be used to read multiple sheets or to introspect
1826-
the sheet names using the ``sheet_names`` attribute.
1826+
the sheet names using the ``sheet_names`` attribute.
18271827

18281828
.. note::
18291829

@@ -3068,13 +3068,48 @@ SQL Queries
30683068
-----------
30693069

30703070
The :mod:`pandas.io.sql` module provides a collection of query wrappers to both
3071-
facilitate data retrieval and to reduce dependency on DB-specific API. These
3072-
wrappers only support the Python database adapters which respect the `Python
3073-
DB-API <http://www.python.org/dev/peps/pep-0249/>`__. See some
3074-
:ref:`cookbook examples <cookbook.sql>` for some advanced strategies
3071+
facilitate data retrieval and to reduce dependency on DB-specific API. Database abstraction
3072+
is provided by SQLAlchemy if installed, in addition you will need a driver library for
3073+
your database.
30753074

3076-
For example, suppose you want to query some data with different types from a
3077-
table such as:
3075+
.. versionadded:: 0.14.0
3076+
3077+
3078+
If SQLAlchemy is not installed a legacy fallback is provided for sqlite and mysql.
3079+
These legacy modes require Python database adapters which respect the `Python
3080+
DB-API <http://www.python.org/dev/peps/pep-0249/>`__.
3081+
3082+
See also some :ref:`cookbook examples <cookbook.sql>` for some advanced strategies.
3083+
3084+
The key functions are:
3085+
:func:`~pandas.io.sql.to_sql`
3086+
:func:`~pandas.io.sql.read_sql`
3087+
:func:`~pandas.io.sql.read_table`
3088+
3089+
3090+
In the following example, we use the `SQlite <http://www.sqlite.org/>`__ SQL database
3091+
engine. You can use a temporary SQLite database where data are stored in
3092+
"memory".
3093+
3094+
To connect with SQLAlchemy you use the :func:`create_engine` function to create an engine
3095+
object from database URI. You only need to create the engine once per database you are
3096+
connecting to.
3097+
3098+
For more information on :func:`create_engine` and the URI formatting, see the examples
3099+
below and the SQLAlchemy `documentation <http://docs.sqlalchemy.org/en/rel_0_9/core/engines.html>`__
3100+
3101+
.. code-block:: python
3102+
3103+
from sqlalchemy import create_engine
3104+
from pandas.io import sql
3105+
# Create your connection.
3106+
engine = create_engine('sqlite:///:memory:')
3107+
3108+
Writing DataFrames
3109+
~~~~~~~~~~~~~~~~~~
3110+
3111+
Assuming the following data is in a DataFrame ``data``, we can insert it into
3112+
the database using :func:`~pandas.io.sql.to_sql`.
30783113

30793114

30803115
+-----+------------+-------+-------+-------+
@@ -3088,81 +3123,141 @@ table such as:
30883123
+-----+------------+-------+-------+-------+
30893124

30903125

3091-
Functions from :mod:`pandas.io.sql` can extract some data into a DataFrame. In
3092-
the following example, we use the `SQlite <http://www.sqlite.org/>`__ SQL database
3093-
engine. You can use a temporary SQLite database where data are stored in
3094-
"memory". Just do:
3095-
3096-
.. code-block:: python
3126+
.. ipython:: python
3127+
:suppress:
30973128
3098-
import sqlite3
3129+
from sqlalchemy import create_engine
30993130
from pandas.io import sql
3100-
# Create your connection.
3101-
cnx = sqlite3.connect(':memory:')
3131+
engine = create_engine('sqlite:///:memory:')
31023132
31033133
.. ipython:: python
31043134
:suppress:
31053135
3106-
import sqlite3
3107-
from pandas.io import sql
3108-
cnx = sqlite3.connect(':memory:')
3136+
c = ['id', 'Date', 'Col_1', 'Col_2', 'Col_3']
3137+
d = [(26, datetime.datetime(2010,10,18), 'X', 27.5, True),
3138+
(42, datetime.datetime(2010,10,19), 'Y', -12.5, False),
3139+
(63, datetime.datetime(2010,10,20), 'Z', 5.73, True)]
3140+
3141+
data = DataFrame(d, columns=c)
31093142
31103143
.. ipython:: python
3111-
:suppress:
31123144
3113-
cu = cnx.cursor()
3114-
# Create a table named 'data'.
3115-
cu.execute("""CREATE TABLE data(id integer,
3116-
date date,
3117-
Col_1 string,
3118-
Col_2 float,
3119-
Col_3 bool);""")
3120-
cu.executemany('INSERT INTO data VALUES (?,?,?,?,?)',
3121-
[(26, datetime.datetime(2010,10,18), 'X', 27.5, True),
3122-
(42, datetime.datetime(2010,10,19), 'Y', -12.5, False),
3123-
(63, datetime.datetime(2010,10,20), 'Z', 5.73, True)])
3145+
Reading Tables
3146+
~~~~~~~~~~~~~~
3147+
3148+
:func:`~pandas.io.sql.read_table` will read a databse table given the
3149+
table name and optionally a subset of columns to read.
31243150

3151+
.. note::
31253152

3126-
Let ``data`` be the name of your SQL table. With a query and your database
3127-
connection, just use the :func:`~pandas.io.sql.read_sql` function to get the
3128-
query results into a DataFrame:
3153+
In order to use :func:`~pandas.io.sql.read_table`, you **must** have the
3154+
SQLAlchemy optional dependency installed.
31293155

31303156
.. ipython:: python
31313157
3132-
sql.read_sql("SELECT * FROM data;", cnx)
3158+
sql.read_table('data', engine)
31333159
3134-
You can also specify the name of the column as the DataFrame index:
3160+
You can also specify the name of the column as the DataFrame index,
3161+
and specify a subset of columns to be read.
31353162

31363163
.. ipython:: python
31373164
3138-
sql.read_sql("SELECT * FROM data;", cnx, index_col='id')
3139-
sql.read_sql("SELECT * FROM data;", cnx, index_col='date')
3165+
sql.read_table('data', engine, index_col='id')
3166+
sql.read_table('data', engine, columns=['Col_1', 'Col_2'])
31403167
3141-
Of course, you can specify a more "complex" query.
3168+
And you can explicitly force columns to be parsed as dates:
3169+
3170+
.. ipython:: python
3171+
3172+
sql.read_table('data', engine, parse_dates=['Date'])
3173+
3174+
If needed you can explicitly specifiy a format string, or a dict of arguments
3175+
to pass to :func:`pandas.tseries.tools.to_datetime`.
3176+
3177+
.. code-block:: python
3178+
3179+
sql.read_table('data', engine, parse_dates={'Date': '%Y-%m-%d'})
3180+
sql.read_table('data', engine, parse_dates={'Date': {'format': '%Y-%m-%d %H:%M:%S'}})
3181+
3182+
3183+
You can check if a table exists using :func:`~pandas.io.sql.has_table`
3184+
3185+
In addition, the class :class:`~pandas.io.sql.PandasSQLWithEngine` can be
3186+
instantiated directly for more manual control over the SQL interaction.
3187+
3188+
Querying
3189+
~~~~~~~~
3190+
3191+
You can query using raw SQL in the :func:`~pandas.io.sql.read_sql` function.
3192+
In this case you must use the SQL variant appropriate for your database.
3193+
When using SQLAlchemy, you can also pass SQLAlchemy Expression language constructs,
3194+
which are database-agnostic.
31423195

31433196
.. ipython:: python
31443197
3145-
sql.read_sql("SELECT id, Col_1, Col_2 FROM data WHERE id = 42;", cnx)
3198+
sql.read_sql('SELECT * FROM data', engine)
3199+
3200+
Of course, you can specify a more "complex" query.
31463201

31473202
.. ipython:: python
3148-
:suppress:
31493203
3150-
cu.close()
3151-
cnx.close()
3204+
sql.read_frame("SELECT id, Col_1, Col_2 FROM data WHERE id = 42;", engine)
31523205
31533206
3154-
There are a few other available functions:
3207+
You can also run a plain query without creating a dataframe with
3208+
:func:`~pandas.io.sql.execute`. This is useful for queries that don't return values,
3209+
such as INSERT. This is functionally equivalent to calling ``execute`` on the
3210+
SQLAlchemy engine or db connection object. Again, ou must use the SQL syntax
3211+
variant appropriate for your database.
31553212

3156-
- ``tquery`` returns a list of tuples corresponding to each row.
3157-
- ``uquery`` does the same thing as tquery, but instead of returning results
3158-
it returns the number of related rows.
3159-
- ``write_frame`` writes records stored in a DataFrame into the SQL table.
3160-
- ``has_table`` checks if a given SQLite table exists.
3213+
.. code-block:: python
31613214
3162-
.. note::
3215+
sql.execute('SELECT * FROM table_name', engine)
3216+
3217+
sql.execute('INSERT INTO table_name VALUES(?, ?, ?)', engine, params=[('id', 1, 12.2, True)])
3218+
3219+
Engine connection examples
3220+
~~~~~~~~~~~~~~~~~~~~~~~~~~
3221+
3222+
.. code-block:: python
3223+
3224+
from sqlalchemy import create_engine
3225+
3226+
engine = create_engine('postgresql://scott:tiger@localhost:5432/mydatabase')
3227+
3228+
engine = create_engine('mysql+mysqldb://scott:tiger@localhost/foo')
3229+
3230+
engine = create_engine('oracle://scott:[email protected]:1521/sidname')
3231+
3232+
engine = create_engine('mssql+pyodbc://mydsn')
3233+
3234+
# sqlite://<nohostname>/<path>
3235+
# where <path> is relative:
3236+
engine = create_engine('sqlite:///foo.db')
3237+
3238+
# or absolute, starting with a slash:
3239+
engine = create_engine('sqlite:////absolute/path/to/foo.db')
3240+
3241+
3242+
Legacy
3243+
~~~~~~
3244+
To use the sqlite support without SQLAlchemy, you can create connections like so:
3245+
3246+
.. code-block:: python
3247+
3248+
import sqlite3
3249+
from pandas.io import sql
3250+
cnx = sqlite3.connect(':memory:')
3251+
3252+
And then issue the following queries, remembering to also specify the flavor of SQL
3253+
you are using.
3254+
3255+
.. code-block:: python
3256+
3257+
sql.to_sql(data, 'data', cnx, flavor='sqlite')
3258+
3259+
sql.read_sql("SELECT * FROM data", cnx, flavor='sqlite')
31633260
3164-
For now, writing your DataFrame into a database works only with
3165-
**SQLite**. Moreover, the **index** will currently be **dropped**.
31663261
31673262
.. _io.bigquery:
31683263

0 commit comments

Comments
 (0)