Skip to content

Commit 3222de1

Browse files
committed
CLN: remove pandas/io/gbq.py and tests and replace with pandas-gbq
closes pandas-dev#15347
1 parent fb7dc7d commit 3222de1

File tree

11 files changed

+110
-2679
lines changed

11 files changed

+110
-2679
lines changed

ci/requirements-2.7.pip

+1-4
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,5 @@
11
blosc
2-
httplib2
3-
google-api-python-client==1.2
4-
python-gflags==2.0
5-
oauth2client==1.5.0
2+
pandas-gbq
63
pathlib
74
backports.lzma
85
py

ci/requirements-3.4.pip

-3
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,2 @@
11
python-dateutil==2.2
22
blosc
3-
httplib2
4-
google-api-python-client
5-
oauth2client

ci/requirements-3.4_SLOW.pip

-3
This file was deleted.

ci/requirements-3.5.pip

+1
Original file line numberDiff line numberDiff line change
@@ -1 +1,2 @@
11
xarray==0.9.1
2+
pandas-gbq

doc/source/io.rst

+3-284
Original file line numberDiff line numberDiff line change
@@ -4652,293 +4652,12 @@ And then issue the following queries:
46524652
Google BigQuery
46534653
---------------
46544654

4655-
.. versionadded:: 0.13.0
4656-
4657-
The :mod:`pandas.io.gbq` module provides a wrapper for Google's BigQuery
4658-
analytics web service to simplify retrieving results from BigQuery tables
4659-
using SQL-like queries. Result sets are parsed into a pandas
4660-
DataFrame with a shape and data types derived from the source table.
4661-
Additionally, DataFrames can be inserted into new BigQuery tables or appended
4662-
to existing tables.
4663-
4664-
.. warning::
4665-
4666-
To use this module, you will need a valid BigQuery account. Refer to the
4667-
`BigQuery Documentation <https://cloud.google.com/bigquery/what-is-bigquery>`__
4668-
for details on the service itself.
4669-
4670-
The key functions are:
4671-
4672-
.. currentmodule:: pandas.io.gbq
4673-
4674-
.. autosummary::
4675-
:toctree: generated/
4676-
4677-
read_gbq
4678-
to_gbq
4679-
4680-
.. currentmodule:: pandas
4681-
4682-
4683-
Supported Data Types
4684-
''''''''''''''''''''
4685-
4686-
Pandas supports all these `BigQuery data types <https://cloud.google.com/bigquery/data-types>`__:
4687-
``STRING``, ``INTEGER`` (64bit), ``FLOAT`` (64 bit), ``BOOLEAN`` and
4688-
``TIMESTAMP`` (microsecond precision). Data types ``BYTES`` and ``RECORD``
4689-
are not supported.
4690-
4691-
Integer and boolean ``NA`` handling
4692-
'''''''''''''''''''''''''''''''''''
4693-
4694-
.. versionadded:: 0.20
4695-
4696-
Since all columns in BigQuery queries are nullable, and NumPy lacks of ``NA``
4697-
support for integer and boolean types, this module will store ``INTEGER`` or
4698-
``BOOLEAN`` columns with at least one ``NULL`` value as ``dtype=object``.
4699-
Otherwise those columns will be stored as ``dtype=int64`` or ``dtype=bool``
4700-
respectively.
4701-
4702-
This is opposite to default pandas behaviour which will promote integer
4703-
type to float in order to store NAs. See the :ref:`gotchas<gotchas.intna>`
4704-
for detailed explaination.
4705-
4706-
While this trade-off works well for most cases, it breaks down for storing
4707-
values greater than 2**53. Such values in BigQuery can represent identifiers
4708-
and unnoticed precision lost for identifier is what we want to avoid.
4709-
4710-
.. _io.bigquery_deps:
4711-
4712-
Dependencies
4713-
''''''''''''
4714-
4715-
This module requires following additional dependencies:
4716-
4717-
- `httplib2 <https://github.com/httplib2/httplib2>`__: HTTP client
4718-
- `google-api-python-client <http://github.com/google/google-api-python-client>`__: Google's API client
4719-
- `oauth2client <https://github.com/google/oauth2client>`__: authentication and authorization for Google's API
4720-
4721-
.. _io.bigquery_authentication:
4722-
4723-
Authentication
4724-
''''''''''''''
4725-
4726-
.. versionadded:: 0.18.0
4727-
4728-
Authentication to the Google ``BigQuery`` service is via ``OAuth 2.0``.
4729-
Is possible to authenticate with either user account credentials or service account credentials.
4730-
4731-
Authenticating with user account credentials is as simple as following the prompts in a browser window
4732-
which will be automatically opened for you. You will be authenticated to the specified
4733-
``BigQuery`` account using the product name ``pandas GBQ``. It is only possible on local host.
4734-
The remote authentication using user account credentials is not currently supported in pandas.
4735-
Additional information on the authentication mechanism can be found
4736-
`here <https://developers.google.com/identity/protocols/OAuth2#clientside/>`__.
4737-
4738-
Authentication with service account credentials is possible via the `'private_key'` parameter. This method
4739-
is particularly useful when working on remote servers (eg. jupyter iPython notebook on remote host).
4740-
Additional information on service accounts can be found
4741-
`here <https://developers.google.com/identity/protocols/OAuth2#serviceaccount>`__.
4742-
4743-
Authentication via ``application default credentials`` is also possible. This is only valid
4744-
if the parameter ``private_key`` is not provided. This method also requires that
4745-
the credentials can be fetched from the environment the code is running in.
4746-
Otherwise, the OAuth2 client-side authentication is used.
4747-
Additional information on
4748-
`application default credentials <https://developers.google.com/identity/protocols/application-default-credentials>`__.
4749-
4750-
.. versionadded:: 0.19.0
4751-
4752-
.. note::
4753-
4754-
The `'private_key'` parameter can be set to either the file path of the service account key
4755-
in JSON format, or key contents of the service account key in JSON format.
4756-
4757-
.. note::
4758-
4759-
A private key can be obtained from the Google developers console by clicking
4760-
`here <https://console.developers.google.com/permissions/serviceaccounts>`__. Use JSON key type.
4761-
4762-
.. _io.bigquery_reader:
4763-
4764-
Querying
4765-
''''''''
4766-
4767-
Suppose you want to load all data from an existing BigQuery table : `test_dataset.test_table`
4768-
into a DataFrame using the :func:`~pandas.io.gbq.read_gbq` function.
4769-
4770-
.. code-block:: python
4771-
4772-
# Insert your BigQuery Project ID Here
4773-
# Can be found in the Google web console
4774-
projectid = "xxxxxxxx"
4775-
4776-
data_frame = pd.read_gbq('SELECT * FROM test_dataset.test_table', projectid)
4777-
4778-
4779-
You can define which column from BigQuery to use as an index in the
4780-
destination DataFrame as well as a preferred column order as follows:
4781-
4782-
.. code-block:: python
4783-
4784-
data_frame = pd.read_gbq('SELECT * FROM test_dataset.test_table',
4785-
index_col='index_column_name',
4786-
col_order=['col1', 'col2', 'col3'], projectid)
4787-
4788-
4789-
Starting with 0.20.0, you can specify the query config as parameter to use additional options of your job.
4790-
For more information about query configuration parameters see
4791-
`here <https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs#configuration.query>`__.
4792-
4793-
.. code-block:: python
4794-
4795-
configuration = {
4796-
'query': {
4797-
"useQueryCache": False
4798-
}
4799-
}
4800-
data_frame = pd.read_gbq('SELECT * FROM test_dataset.test_table',
4801-
configuration=configuration, projectid)
4802-
4803-
4804-
.. note::
4805-
4806-
You can find your project id in the `Google developers console <https://console.developers.google.com>`__.
4807-
4808-
4809-
.. note::
4810-
4811-
You can toggle the verbose output via the ``verbose`` flag which defaults to ``True``.
4812-
4813-
.. note::
4814-
4815-
The ``dialect`` argument can be used to indicate whether to use BigQuery's ``'legacy'`` SQL
4816-
or BigQuery's ``'standard'`` SQL (beta). The default value is ``'legacy'``. For more information
4817-
on BigQuery's standard SQL, see `BigQuery SQL Reference
4818-
<https://cloud.google.com/bigquery/sql-reference/>`__
4819-
4820-
.. _io.bigquery_writer:
4821-
4822-
Writing DataFrames
4823-
''''''''''''''''''
4824-
4825-
Assume we want to write a DataFrame ``df`` into a BigQuery table using :func:`~pandas.DataFrame.to_gbq`.
4826-
4827-
.. ipython:: python
4828-
4829-
df = pd.DataFrame({'my_string': list('abc'),
4830-
'my_int64': list(range(1, 4)),
4831-
'my_float64': np.arange(4.0, 7.0),
4832-
'my_bool1': [True, False, True],
4833-
'my_bool2': [False, True, False],
4834-
'my_dates': pd.date_range('now', periods=3)})
4835-
4836-
df
4837-
df.dtypes
4838-
4839-
.. code-block:: python
4840-
4841-
df.to_gbq('my_dataset.my_table', projectid)
4842-
4843-
.. note::
4844-
4845-
The destination table and destination dataset will automatically be created if they do not already exist.
4846-
4847-
The ``if_exists`` argument can be used to dictate whether to ``'fail'``, ``'replace'``
4848-
or ``'append'`` if the destination table already exists. The default value is ``'fail'``.
4849-
4850-
For example, assume that ``if_exists`` is set to ``'fail'``. The following snippet will raise
4851-
a ``TableCreationError`` if the destination table already exists.
4852-
4853-
.. code-block:: python
4854-
4855-
df.to_gbq('my_dataset.my_table', projectid, if_exists='fail')
4856-
4857-
.. note::
4858-
4859-
If the ``if_exists`` argument is set to ``'append'``, the destination dataframe will
4860-
be written to the table using the defined table schema and column types. The
4861-
dataframe must match the destination table in structure and data types.
4862-
If the ``if_exists`` argument is set to ``'replace'``, and the existing table has a
4863-
different schema, a delay of 2 minutes will be forced to ensure that the new schema
4864-
has propagated in the Google environment. See
4865-
`Google BigQuery issue 191 <https://code.google.com/p/google-bigquery/issues/detail?id=191>`__.
4866-
4867-
Writing large DataFrames can result in errors due to size limitations being exceeded.
4868-
This can be avoided by setting the ``chunksize`` argument when calling :func:`~pandas.DataFrame.to_gbq`.
4869-
For example, the following writes ``df`` to a BigQuery table in batches of 10000 rows at a time:
4870-
4871-
.. code-block:: python
4872-
4873-
df.to_gbq('my_dataset.my_table', projectid, chunksize=10000)
4874-
4875-
You can also see the progress of your post via the ``verbose`` flag which defaults to ``True``.
4876-
For example:
4877-
4878-
.. code-block:: python
4879-
4880-
In [8]: df.to_gbq('my_dataset.my_table', projectid, chunksize=10000, verbose=True)
4881-
4882-
Streaming Insert is 10% Complete
4883-
Streaming Insert is 20% Complete
4884-
Streaming Insert is 30% Complete
4885-
Streaming Insert is 40% Complete
4886-
Streaming Insert is 50% Complete
4887-
Streaming Insert is 60% Complete
4888-
Streaming Insert is 70% Complete
4889-
Streaming Insert is 80% Complete
4890-
Streaming Insert is 90% Complete
4891-
Streaming Insert is 100% Complete
4892-
4893-
.. note::
4894-
4895-
If an error occurs while streaming data to BigQuery, see
4896-
`Troubleshooting BigQuery Errors <https://cloud.google.com/bigquery/troubleshooting-errors>`__.
4897-
4898-
.. note::
4899-
4900-
The BigQuery SQL query language has some oddities, see the
4901-
`BigQuery Query Reference Documentation <https://cloud.google.com/bigquery/query-reference>`__.
4902-
4903-
.. note::
4904-
4905-
While BigQuery uses SQL-like syntax, it has some important differences from traditional
4906-
databases both in functionality, API limitations (size and quantity of queries or uploads),
4907-
and how Google charges for use of the service. You should refer to `Google BigQuery documentation <https://cloud.google.com/bigquery/what-is-bigquery>`__
4908-
often as the service seems to be changing and evolving. BiqQuery is best for analyzing large
4909-
sets of data quickly, but it is not a direct replacement for a transactional database.
4910-
4911-
.. _io.bigquery_create_tables:
4912-
4913-
Creating BigQuery Tables
4914-
''''''''''''''''''''''''
4915-
49164655
.. warning::
49174656

4918-
As of 0.17, the function :func:`~pandas.io.gbq.generate_bq_schema` has been deprecated and will be
4919-
removed in a future version.
4920-
4921-
As of 0.15.2, the gbq module has a function :func:`~pandas.io.gbq.generate_bq_schema` which will
4922-
produce the dictionary representation schema of the specified pandas DataFrame.
4923-
4924-
.. code-block:: ipython
4925-
4926-
In [10]: gbq.generate_bq_schema(df, default_type='STRING')
4927-
4928-
Out[10]: {'fields': [{'name': 'my_bool1', 'type': 'BOOLEAN'},
4929-
{'name': 'my_bool2', 'type': 'BOOLEAN'},
4930-
{'name': 'my_dates', 'type': 'TIMESTAMP'},
4931-
{'name': 'my_float64', 'type': 'FLOAT'},
4932-
{'name': 'my_int64', 'type': 'INTEGER'},
4933-
{'name': 'my_string', 'type': 'STRING'}]}
4934-
4935-
.. note::
4936-
4937-
If you delete and re-create a BigQuery table with the same name, but different table schema,
4938-
you must wait 2 minutes before streaming data into the table. As a workaround, consider creating
4939-
the new table with a different name. Refer to
4940-
`Google BigQuery issue 191 <https://code.google.com/p/google-bigquery/issues/detail?id=191>`__.
4657+
Starting in 0.20.0, pandas has split off Google BigQuery support into the
4658+
separate package ``pandas-gbq``. You can ``pip install pandas-gbq`` to get it.
49414659

4660+
Documentation is now hosted `here <https://pandas-gbq.readthedocs.io/>`__
49424661

49434662
.. _io.stata:
49444663

doc/source/whatsnew/v0.20.0.txt

+9
Original file line numberDiff line numberDiff line change
@@ -360,6 +360,15 @@ New Behavior:
360360
In [5]: df['a']['2011-12-31 23:59:59']
361361
Out[5]: 1
362362

363+
.. _whatsnew_0200.api_breaking.gbq:
364+
365+
Pandas Google BigQuery support has moved
366+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
367+
368+
pandas has split off Google BigQuery support into a separate package ``pandas-gbq``. You can ``pip install pandas-gbq`` to get it.
369+
The functionality of ``pd.read_gbq()`` and ``.to_gbq()`` remains the same with the currently released version of ``pandas-gbq=0.1.2``. (:issue:`15347`)
370+
Documentation is now hosted `here <https://pandas-gbq.readthedocs.io/>`__
371+
363372
.. _whatsnew_0200.api_breaking.memory_usage:
364373

365374
Memory Usage for Index is more Accurate

pandas/core/frame.py

+8-1
Original file line numberDiff line numberDiff line change
@@ -77,7 +77,8 @@
7777
OrderedDict, raise_with_traceback)
7878
from pandas import compat
7979
from pandas.compat.numpy import function as nv
80-
from pandas.util.decorators import deprecate_kwarg, Appender, Substitution
80+
from pandas.util.decorators import (deprecate_kwarg, Appender,
81+
Substitution, docstring_wrapper)
8182
from pandas.util.validators import validate_bool_kwarg
8283

8384
from pandas.tseries.period import PeriodIndex
@@ -941,6 +942,12 @@ def to_gbq(self, destination_table, project_id, chunksize=10000,
941942
chunksize=chunksize, verbose=verbose, reauth=reauth,
942943
if_exists=if_exists, private_key=private_key)
943944

945+
def _f():
946+
from pandas.io.gbq import _try_import
947+
return _try_import().to_gbq.__doc__
948+
to_gbq = docstring_wrapper(
949+
to_gbq, _f, default='the pandas_gbq package is not installed')
950+
944951
@classmethod
945952
def from_records(cls, data, index=None, exclude=None, columns=None,
946953
coerce_float=False, nrows=None):

0 commit comments

Comments
 (0)