Skip to content

Commit 2c60400

Browse files
Jacob Schaerjreback
Jacob Schaer
authored andcommitted
Updated All Documentation and CI Requirements
1 parent 390a2d6 commit 2c60400

File tree

6 files changed

+70
-0
lines changed

6 files changed

+70
-0
lines changed

ci/requirements-2.6.txt

+1
Original file line numberDiff line numberDiff line change
@@ -4,3 +4,4 @@ python-dateutil==1.5
44
pytz==2013b
55
http://www.crummy.com/software/BeautifulSoup/bs4/download/4.2/beautifulsoup4-4.2.0.tar.gz
66
html5lib==1.0b2
7+
bigquery==2.0.15

ci/requirements-2.7.txt

+1
Original file line numberDiff line numberDiff line change
@@ -18,3 +18,4 @@ MySQL-python==1.2.4
1818
scipy==0.10.0
1919
beautifulsoup4==4.2.1
2020
statsmodels==0.5.0
21+
bigquery==2.0.15

ci/requirements-2.7_LOCALE.txt

+1
Original file line numberDiff line numberDiff line change
@@ -16,3 +16,4 @@ lxml==3.2.1
1616
scipy==0.10.0
1717
beautifulsoup4==4.2.1
1818
statsmodels==0.5.0
19+
bigquery==2.0.15

doc/source/install.rst

+1
Original file line numberDiff line numberDiff line change
@@ -114,6 +114,7 @@ Optional Dependencies
114114
:func:`~pandas.io.clipboard.read_clipboard`. Most package managers on Linux
115115
distributions will have xclip and/or xsel immediately available for
116116
installation.
117+
* `Google bq Command Line Tool <https://developers.google.com/bigquery/bq-command-line-tool/>`__: Needed for :mod:`pandas.io.gbq`
117118
* One of the following combinations of libraries is needed to use the
118119
top-level :func:`~pandas.io.html.read_html` function:
119120

doc/source/io.rst

+65
Original file line numberDiff line numberDiff line change
@@ -38,6 +38,7 @@ object.
3838
* ``read_json``
3939
* ``read_msgpack`` (experimental)
4040
* ``read_html``
41+
* ``read_gbq`` (experimental)
4142
* ``read_stata``
4243
* ``read_clipboard``
4344
* ``read_pickle``
@@ -51,6 +52,7 @@ The corresponding ``writer`` functions are object methods that are accessed like
5152
* ``to_json``
5253
* ``to_msgpack`` (experimental)
5354
* ``to_html``
55+
* ``to_gbq`` (experimental)
5456
* ``to_stata``
5557
* ``to_clipboard``
5658
* ``to_pickle``
@@ -2905,7 +2907,70 @@ There are a few other available functions:
29052907
For now, writing your DataFrame into a database works only with
29062908
**SQLite**. Moreover, the **index** will currently be **dropped**.
29072909

2910+
Google BigQuery (Experimental)
2911+
------------------------------
29082912

2913+
The :mod:`pandas.io.gbq` module provides a wrapper for Google's BigQuery
2914+
analytics web service to simplify retrieving results from BigQuery tables
2915+
using SQL-like queries. Result sets are parsed into a pandas
2916+
DataFrame with a shape derived from the source table. Additionally,
2917+
DataFrames can be uploaded into BigQuery datasets as tables
2918+
if the source datatypes are compatible with BigQuery ones. The general
2919+
structure of this module and its provided functions are based loosely on those in
2920+
:mod:`pandas.io.sql`.
2921+
2922+
For specifics on the service itself, see: <https://developers.google.com/bigquery/>
2923+
2924+
As an example, suppose you want to load all data from an existing table
2925+
: `test_dataset.test_table`
2926+
into BigQuery and pull it into a DataFrame.
2927+
2928+
.. code-block:: python
2929+
2930+
from pandas.io import gbq
2931+
data_frame = gbq.read_gbq('SELECT * FROM test_dataset.test_table')
2932+
2933+
The user will then be authenticated by the `bq` command line client -
2934+
this usually involves the default browser opening to a login page,
2935+
though the process can be done entirely from command line if necessary.
2936+
Datasets and additional parameters can be either configured with `bq`,
2937+
passed in as options to `read_gbq`, or set using Google's gflags (this
2938+
is not officially supported by this module, though care was taken
2939+
to ensure that they should be followed regardless of how you call the
2940+
method).
2941+
2942+
Additionally, you can define which column to use as an index as well as a preferred column order as follows:
2943+
2944+
.. code-block:: python
2945+
2946+
data_frame = gbq.read_gbq('SELECT * FROM test_dataset.test_table', index_col='index_column_name', col_order='[col1, col2, col3,...]')
2947+
2948+
Finally, if you would like to create a BigQuery table, `my_dataset.my_table`, from the rows of DataFrame, `df`:
2949+
2950+
.. code-block:: python
2951+
2952+
df = pandas.DataFrame({'string_col_name' : ['hello'],
2953+
'integer_col_name' : [1],
2954+
'boolean_col_name' : [True]})
2955+
schema = ['STRING', 'INTEGER', 'BOOLEAN']
2956+
data_frame = gbq.to_gbq(df, 'my_dataset.my_table', if_exists='fail', schema = schema)
2957+
2958+
To add more rows to this, simply:
2959+
2960+
.. code-block:: python
2961+
2962+
df2 = pandas.DataFrame({'string_col_name' : ['hello2'],
2963+
'integer_col_name' : [2],
2964+
'boolean_col_name' : [False]})
2965+
data_frame = gbq.to_gbq(df2, 'my_dataset.my_table', if_exists='append')
2966+
2967+
2968+
2969+
.. note::
2970+
2971+
* There is a hard cap on BigQuery result sets, at 128MB compressed. Also, the BigQuery SQL query language has some oddities,
2972+
see: <https://developers.google.com/bigquery/query-reference>
2973+
29092974
STATA Format
29102975
------------
29112976

doc/source/release.rst

+1
Original file line numberDiff line numberDiff line change
@@ -78,6 +78,7 @@ Experimental Features
7878
- Add msgpack support via ``pd.read_msgpack()`` and ``pd.to_msgpack()`` / ``df.to_msgpack()`` for serialization
7979
of arbitrary pandas (and python objects) in a lightweight portable binary format (:issue:`686`)
8080
- Added PySide support for the qtpandas DataFrameModel and DataFrameWidget.
81+
- Added :mod:`pandas.io.gbq` for reading from (and writing to) Google BigQuery into a DataFrame. (:issue:`4140`)
8182

8283
Improvements to existing features
8384
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

0 commit comments

Comments
 (0)