Skip to content

ENH/DOC: update pandas-gbq signature and docstring #20564

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 13 commits into from
Apr 9, 2018
1 change: 1 addition & 0 deletions doc/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -350,6 +350,7 @@
intersphinx_mapping = {
'statsmodels': ('http://www.statsmodels.org/devel/', None),
'matplotlib': ('http://matplotlib.org/', None),
'pandas-gbq': ('https://pandas-gbq.readthedocs.io/en/latest/', None),
'python': ('https://docs.python.org/3/', None),
'numpy': ('https://docs.scipy.org/doc/numpy/', None),
'scipy': ('https://docs.scipy.org/doc/scipy/reference/', None),
Expand Down
59 changes: 42 additions & 17 deletions pandas/core/frame.py
Original file line number Diff line number Diff line change
Expand Up @@ -1116,16 +1116,15 @@ def to_dict(self, orient='dict', into=dict):
else:
raise ValueError("orient '%s' not understood" % orient)

def to_gbq(self, destination_table, project_id, chunksize=10000,
verbose=True, reauth=False, if_exists='fail', private_key=None):
"""Write a DataFrame to a Google BigQuery table.

The main method a user calls to export pandas DataFrame contents to
Google BigQuery table.
def to_gbq(
self, destination_table, project_id, chunksize=10000,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you keep this on the previous line as it was? (to keep consistency, we don't use this style anywhere else in the file for def lines)

verbose=True, reauth=False, if_exists='fail', private_key=None,
auth_local_webserver=False, table_schema=None):
"""
Write a DataFrame to a Google BigQuery table.

Google BigQuery API Client Library v2 for Python is used.
Documentation is available `here
<https://developers.google.com/api-client-library/python/apis/bigquery/v2>`__
This function requires the `pandas-gbq package
<https://pandas-gbq.readthedocs.io>`__.

Authentication to the Google BigQuery service is via OAuth 2.0.

Expand All @@ -1144,32 +1143,58 @@ def to_gbq(self, destination_table, project_id, chunksize=10000,
Parameters
----------
dataframe : DataFrame
DataFrame to be written
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this needs to stay

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The validation script was complaining about this one. I think because this is a method of DataFrame, so there is no dataframe argument. (I also tried self and it complained about that, too.)

################################################################################
################################## Validation ##################################
################################################################################

Errors found:
	Errors in parameters section
		Unknown parameters {'dataframe'}
		Parameter "kwargs" description should finish with "."
	No returns section found
	Missing description for See Also "pandas_gbq.to_gbq" reference
	No examples section found

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh don’t worry about that too much

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think @tswast is correct that this should be removed. The method does not take a dataframe as input (it is writing self to gbq)

DataFrame to be written to Google BigQuery.
destination_table : string
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

string -> str

Name of table to be written, in the form 'dataset.tablename'
Name of table to be written, in the form 'dataset.tablename'.
project_id : str
Google BigQuery Account project ID.
chunksize : int (default 10000)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"int (default 10000)" -> "int, default 10000"

(and same comment for other similar cases below)

Number of rows to be inserted in each chunk from the dataframe.
Set to ``None`` to load the whole dataframe at once.
verbose : boolean (default True)
Show percentage complete
Show percentage complete.
reauth : boolean (default False)
Force Google BigQuery to reauthenticate the user. This is useful
if multiple accounts are used.
if_exists : {'fail', 'replace', 'append'}, default 'fail'
Behavior when the destination table exists.
'fail': If table exists, do nothing.
'replace': If table exists, drop it, recreate it, and insert data.
'append': If table exists, insert data. Create if does not exist.
private_key : str (optional)
Service account private key in JSON format. Can be file path
or string contents. This is useful for remote server
authentication (eg. Jupyter/IPython notebook on remote host)
"""
authentication (eg. Jupyter/IPython notebook on remote host).
auth_local_webserver : boolean (default False)
Use the [local webserver flow] instead of the [console flow]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think those [..] need a trailing underscore to make them into actual links?

(you can test with python doc/make.py --single DataFrame.to_gbq)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 37b2a08

Also, thanks for the test command. Much faster than trying to build the whole docs set!

when getting user credentials.

.. [local webserver flow]
http://google-auth-oauthlib.readthedocs.io/en/latest/reference/google_auth_oauthlib.flow.html#google_auth_oauthlib.flow.InstalledAppFlow.run_local_server
.. [console flow]
http://google-auth-oauthlib.readthedocs.io/en/latest/reference/google_auth_oauthlib.flow.html#google_auth_oauthlib.flow.InstalledAppFlow.run_console

*New in version 0.2.0 of pandas-gbq*.
table_schema : list of dicts (optional)
List of BigQuery table fields to which according DataFrame
columns conform to, e.g. `[{'name': 'col1', 'type':
'STRING'},...]`. If schema is not provided, it will be
generated according to dtypes of DataFrame columns. See
BigQuery API documentation on available names of a field.

*New in version 0.3.1 of pandas-gbq*.

See Also
--------
pandas_gbq.to_gbq : This function in the pandas-gbq library.
pandas.read_gbq : Read a DataFrame from Google BigQuery.
"""
from pandas.io import gbq
return gbq.to_gbq(self, destination_table, project_id=project_id,
chunksize=chunksize, verbose=verbose, reauth=reauth,
if_exists=if_exists, private_key=private_key)
return gbq.to_gbq(
self, destination_table, project_id, chunksize=chunksize,
verbose=verbose, reauth=reauth, if_exists=if_exists,
private_key=private_key, auth_local_webserver=auth_local_webserver,
table_schema=table_schema)

@classmethod
def from_records(cls, data, index=None, exclude=None, columns=None,
Expand Down
51 changes: 28 additions & 23 deletions pandas/io/gbq.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,13 +21,11 @@ def _try_import():
return pandas_gbq


def read_gbq(query, project_id=None, index_col=None, col_order=None,
reauth=False, verbose=True, private_key=None, dialect='legacy',
**kwargs):
r"""Load data from Google BigQuery.

The main method a user calls to execute a Query in Google BigQuery
and read results into a pandas DataFrame.
def read_gbq(
query, project_id=None, index_col=None, col_order=None, reauth=False,
verbose=True, private_key=None, dialect='legacy', **kwargs):
"""
Load data from Google BigQuery.

This function requires the `pandas-gbq package
<https://pandas-gbq.readthedocs.io>`__.
Expand All @@ -49,32 +47,32 @@ def read_gbq(query, project_id=None, index_col=None, col_order=None,
Parameters
----------
query : str
SQL-Like Query to return data values
SQL-Like Query to return data values.
project_id : str
Google BigQuery Account project ID.
index_col : str (optional)
Name of result column to use for index in results DataFrame
Name of result column to use for index in results DataFrame.
col_order : list(str) (optional)
List of BigQuery column names in the desired order for results
DataFrame
DataFrame.
reauth : boolean (default False)
Force Google BigQuery to reauthenticate the user. This is useful
if multiple accounts are used.
verbose : boolean (default True)
Verbose output
Verbose output.
private_key : str (optional)
Service account private key in JSON format. Can be file path
or string contents. This is useful for remote server
authentication (eg. Jupyter/IPython notebook on remote host)

authentication (eg. Jupyter/IPython notebook on remote host).
dialect : {'legacy', 'standard'}, default 'legacy'
SQL syntax dialect to use.
'legacy' : Use BigQuery's legacy SQL dialect.
'standard' : Use BigQuery's standard SQL, which is
compliant with the SQL 2011 standard. For more information
see `BigQuery SQL Reference
<https://cloud.google.com/bigquery/sql-reference/>`__

`**kwargs` : Arbitrary keyword arguments
<https://cloud.google.com/bigquery/sql-reference/>`__.
kwargs : dict
Arbitrary keyword arguments.
configuration (dict): query config parameters for job processing.
For example:

Expand All @@ -86,8 +84,12 @@ def read_gbq(query, project_id=None, index_col=None, col_order=None,
Returns
-------
df: DataFrame
DataFrame representing results of query
DataFrame representing results of query.

See Also
--------
pandas_gbq.read_gbq : This function in the pandas-gbq library.
pandas.DataFrame.to_gbq : Write a DataFrame to Google BigQuery.
"""
pandas_gbq = _try_import()
return pandas_gbq.read_gbq(
Expand All @@ -99,10 +101,13 @@ def read_gbq(query, project_id=None, index_col=None, col_order=None,
**kwargs)


def to_gbq(dataframe, destination_table, project_id, chunksize=10000,
verbose=True, reauth=False, if_exists='fail', private_key=None):
def to_gbq(
dataframe, destination_table, project_id, chunksize=10000,
verbose=True, reauth=False, if_exists='fail', private_key=None,
auth_local_webserver=False, table_schema=None):
pandas_gbq = _try_import()
pandas_gbq.to_gbq(dataframe, destination_table, project_id,
chunksize=chunksize,
verbose=verbose, reauth=reauth,
if_exists=if_exists, private_key=private_key)
return pandas_gbq.to_gbq(
dataframe, destination_table, project_id, chunksize=chunksize,
verbose=verbose, reauth=reauth, if_exists=if_exists,
private_key=private_key, auth_local_webserver=auth_local_webserver,
table_schema=table_schema)