@@ -38,6 +38,7 @@ object.
38
38
* ``read_json ``
39
39
* ``read_msgpack `` (experimental)
40
40
* ``read_html ``
41
+ * ``read_gbq `` (experimental)
41
42
* ``read_stata ``
42
43
* ``read_clipboard ``
43
44
* ``read_pickle ``
@@ -51,6 +52,7 @@ The corresponding ``writer`` functions are object methods that are accessed like
51
52
* ``to_json ``
52
53
* ``to_msgpack `` (experimental)
53
54
* ``to_html ``
55
+ * ``to_gbq `` (experimental)
54
56
* ``to_stata ``
55
57
* ``to_clipboard ``
56
58
* ``to_pickle ``
@@ -2905,7 +2907,70 @@ There are a few other available functions:
2905
2907
For now, writing your DataFrame into a database works only with
2906
2908
**SQLite **. Moreover, the **index ** will currently be **dropped **.
2907
2909
2910
+ Google BigQuery (Experimental)
2911
+ ------------------------------
2908
2912
2913
+ The :mod: `pandas.io.gbq ` module provides a wrapper for Google's BigQuery
2914
+ analytics web service to simplify retrieving results from BigQuery tables
2915
+ using SQL-like queries. Result sets are parsed into a pandas
2916
+ DataFrame with a shape derived from the source table. Additionally,
2917
+ DataFrames can be uploaded into BigQuery datasets as tables
2918
+ if the source datatypes are compatible with BigQuery ones. The general
2919
+ structure of this module and its provided functions are based loosely on those in
2920
+ :mod: `pandas.io.sql `.
2921
+
2922
+ For specifics on the service itself, see: <https://developers.google.com/bigquery/>
2923
+
2924
+ As an example, suppose you want to load all data from an existing table
2925
+ : `test_dataset.test_table `
2926
+ into BigQuery and pull it into a DataFrame.
2927
+
2928
+ .. code-block :: python
2929
+
2930
+ from pandas.io import gbq
2931
+ data_frame = gbq.read_gbq(' SELECT * FROM test_dataset.test_table' )
2932
+
2933
+ The user will then be authenticated by the `bq ` command line client -
2934
+ this usually involves the default browser opening to a login page,
2935
+ though the process can be done entirely from command line if necessary.
2936
+ Datasets and additional parameters can be either configured with `bq `,
2937
+ passed in as options to `read_gbq `, or set using Google's gflags (this
2938
+ is not officially supported by this module, though care was taken
2939
+ to ensure that they should be followed regardless of how you call the
2940
+ method).
2941
+
2942
+ Additionally, you can define which column to use as an index as well as a preferred column order as follows:
2943
+
2944
+ .. code-block :: python
2945
+
2946
+ data_frame = gbq.read_gbq(' SELECT * FROM test_dataset.test_table' , index_col = ' index_column_name' , col_order = ' [col1, col2, col3,...]' )
2947
+
2948
+ Finally, if you would like to create a BigQuery table, `my_dataset.my_table `, from the rows of DataFrame, `df `:
2949
+
2950
+ .. code-block :: python
2951
+
2952
+ df = pandas.DataFrame({' string_col_name' : [' hello' ],
2953
+ ' integer_col_name' : [1 ],
2954
+ ' boolean_col_name' : [True ]})
2955
+ schema = [' STRING' , ' INTEGER' , ' BOOLEAN' ]
2956
+ data_frame = gbq.to_gbq(df, ' my_dataset.my_table' , if_exists = ' fail' , schema = schema)
2957
+
2958
+ To add more rows to this, simply:
2959
+
2960
+ .. code-block :: python
2961
+
2962
+ df2 = pandas.DataFrame({' string_col_name' : [' hello2' ],
2963
+ ' integer_col_name' : [2 ],
2964
+ ' boolean_col_name' : [False ]})
2965
+ data_frame = gbq.to_gbq(df2, ' my_dataset.my_table' , if_exists = ' append' )
2966
+
2967
+
2968
+
2969
+ .. note ::
2970
+
2971
+ * There is a hard cap on BigQuery result sets, at 128MB compressed. Also, the BigQuery SQL query language has some oddities,
2972
+ see: <https://developers.google.com/bigquery/query-reference>
2973
+
2909
2974
STATA Format
2910
2975
------------
2911
2976
0 commit comments