Updated All Documentation and CI Requirements

Jacob Schaer · jreback · commit 2c60400399c5 · 2013-10-08T20:23:48.000-04:00
diff --git a/ci/requirements-2.6.txt b/ci/requirements-2.6.txt
@@ -4,3 +4,4 @@ python-dateutil==1.5
 pytz==2013b
 http://www.crummy.com/software/BeautifulSoup/bs4/download/4.2/beautifulsoup4-4.2.0.tar.gz
 html5lib==1.0b2
+bigquery==2.0.15
diff --git a/ci/requirements-2.7.txt b/ci/requirements-2.7.txt
@@ -18,3 +18,4 @@ MySQL-python==1.2.4
 scipy==0.10.0
 beautifulsoup4==4.2.1
 statsmodels==0.5.0
+bigquery==2.0.15
diff --git a/ci/requirements-2.7_LOCALE.txt b/ci/requirements-2.7_LOCALE.txt
@@ -16,3 +16,4 @@ lxml==3.2.1
 scipy==0.10.0
 beautifulsoup4==4.2.1
 statsmodels==0.5.0
+bigquery==2.0.15
diff --git a/doc/source/install.rst b/doc/source/install.rst
@@ -114,6 +114,7 @@ Optional Dependencies
     :func:`~pandas.io.clipboard.read_clipboard`. Most package managers on Linux
     distributions will have xclip and/or xsel immediately available for
     installation.
+  * `Google bq Command Line Tool <https://developers.google.com/bigquery/bq-command-line-tool/>`__: Needed for :mod:`pandas.io.gbq` 
   * One of the following combinations of libraries is needed to use the
     top-level :func:`~pandas.io.html.read_html` function:
 
diff --git a/doc/source/io.rst b/doc/source/io.rst
@@ -38,6 +38,7 @@ object.
     * ``read_json``
     * ``read_msgpack`` (experimental)
     * ``read_html``
+    * ``read_gbq`` (experimental)
     * ``read_stata``
     * ``read_clipboard``
     * ``read_pickle``
@@ -51,6 +52,7 @@ The corresponding ``writer`` functions are object methods that are accessed like
     * ``to_json``
     * ``to_msgpack`` (experimental)
     * ``to_html``
+    * ``to_gbq`` (experimental)
     * ``to_stata``
     * ``to_clipboard``
     * ``to_pickle``
@@ -2905,7 +2907,70 @@ There are a few other available functions:
    For now, writing your DataFrame into a database works only with
    **SQLite**. Moreover, the **index** will currently be **dropped**.
 
+Google BigQuery (Experimental)
+------------------------------
 
+The :mod:`pandas.io.gbq` module provides a wrapper for Google's BigQuery
+analytics web service to simplify retrieving results from BigQuery tables 
+using SQL-like queries. Result sets are parsed into a pandas 
+DataFrame with a shape derived from the source table. Additionally, 
+DataFrames can be uploaded into BigQuery datasets as tables
+if the source datatypes are compatible with BigQuery ones. The general
+structure of this module and its provided functions are based loosely on those in
+ :mod:`pandas.io.sql`.
+
+For specifics on the service itself, see: <https://developers.google.com/bigquery/>
+
+As an example, suppose you want to load all data from an existing table
+: `test_dataset.test_table`
+into BigQuery and pull it into a DataFrame.
+
+.. code-block:: python
+
+   from pandas.io import gbq
+   data_frame = gbq.read_gbq('SELECT * FROM test_dataset.test_table')
+
+The user will then be authenticated by the `bq` command line client - 
+this usually involves the default browser opening to a login page,
+though the process can be done entirely from command line if necessary.
+Datasets and additional parameters can be either configured with `bq`,
+passed in as options to `read_gbq`, or set using Google's gflags (this
+is not officially supported by this module, though care was taken
+to ensure that they should be followed regardless of how you call the
+method). 
+
+Additionally, you can define which column to use as an index as well as a preferred column order as follows:
+
+.. code-block:: python
+
+   data_frame = gbq.read_gbq('SELECT * FROM test_dataset.test_table', index_col='index_column_name', col_order='[col1, col2, col3,...]') 
+
+Finally, if you would like to create a BigQuery table, `my_dataset.my_table`, from the rows of DataFrame, `df`:
+
+.. code-block:: python
+
+   df = pandas.DataFrame({'string_col_name' : ['hello'], 
+         'integer_col_name' : [1],
+         'boolean_col_name' : [True]})
+   schema = ['STRING', 'INTEGER', 'BOOLEAN']
+   data_frame = gbq.to_gbq(df, 'my_dataset.my_table', if_exists='fail', schema = schema)
+
+To add more rows to this, simply:
+
+.. code-block:: python
+
+   df2 = pandas.DataFrame({'string_col_name' : ['hello2'], 
+         'integer_col_name' : [2],
+         'boolean_col_name' : [False]})
+   data_frame = gbq.to_gbq(df2, 'my_dataset.my_table', if_exists='append')
+
+
+
+.. note::
+
+   * There is a hard cap on BigQuery result sets, at 128MB compressed. Also, the BigQuery SQL query language has some oddities,
+   see: <https://developers.google.com/bigquery/query-reference>
+   
 STATA Format
 ------------
 
diff --git a/doc/source/release.rst b/doc/source/release.rst
@@ -78,6 +78,7 @@ Experimental Features
   - Add msgpack support via ``pd.read_msgpack()`` and ``pd.to_msgpack()`` / ``df.to_msgpack()`` for serialization
     of arbitrary pandas (and python objects) in a lightweight portable binary format (:issue:`686`)
   - Added PySide support for the qtpandas DataFrameModel and DataFrameWidget.
+  - Added :mod:`pandas.io.gbq` for reading from (and writing to) Google BigQuery into a DataFrame. (:issue:`4140`) 
 
 Improvements to existing features
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~