Skip to content

Commit b0445ca

Browse files
committed
ENH: Use tz-aware dtype for timestamp.
I couldn't figure out how *not* to get a tz-aware dtype in 0.24.x versions, and I wanted a tz-aware dtype anyway for TIMESTAMP, so this makes it official.
1 parent 8114ede commit b0445ca

File tree

4 files changed

+73
-32
lines changed

4 files changed

+73
-32
lines changed

docs/source/changelog.rst

+11-3
Original file line numberDiff line numberDiff line change
@@ -6,8 +6,6 @@ Changelog
66
0.10.0 / TBD
77
------------
88

9-
- This fixes a bug where pandas-gbq could not upload an empty database. (:issue:`237`)
10-
119
Dependency updates
1210
~~~~~~~~~~~~~~~~~~
1311

@@ -22,11 +20,21 @@ Internal changes
2220

2321
Enhancements
2422
~~~~~~~~~~~~
23+
2524
- Allow ``table_schema`` in :func:`to_gbq` to contain only a subset of columns,
26-
with the rest being populated using the DataFrame dtypes (:issue:`218`)
25+
with the rest being populated using the DataFrame dtypes (:issue:`218`)
2726
(contributed by @johnpaton)
2827
- Read ``project_id`` in :func:`to_gbq` from provided ``credentials`` if
2928
available (contributed by @daureg)
29+
- ``read_gbq`` uses the timezone-aware ``DatetimeTZDtype(unit='ns',
30+
tz='UTC')`` dtype for BigQuery ``TIMESTAMP`` columns. (:issue:`263`)
31+
32+
Bug fixes
33+
~~~~~~~~~
34+
35+
- Fix a bug where pandas-gbq could not upload an empty database.
36+
(:issue:`237`)
37+
3038

3139
.. _changelog-0.9.0:
3240

docs/source/reading.rst

+47-17
Original file line numberDiff line numberDiff line change
@@ -9,21 +9,32 @@ Suppose you want to load all data from an existing BigQuery table
99

1010
.. code-block:: python
1111
12-
# Insert your BigQuery Project ID Here
13-
# Can be found in the Google web console
12+
import pandas_gbq
13+
14+
# TODO: Set your BigQuery Project ID.
1415
projectid = "xxxxxxxx"
1516
16-
data_frame = read_gbq('SELECT * FROM test_dataset.test_table', projectid)
17+
data_frame = pandas_gbq.read_gbq(
18+
'SELECT * FROM `test_dataset.test_table`',
19+
project_id=projectid)
20+
21+
.. note::
1722

23+
A project ID is sometimes optional if it can be inferred during
24+
authentication, but it is required when authenticating with user
25+
credentials. You can find your project ID in the `Google Cloud console
26+
<https://console.cloud.google.com>`__.
1827

1928
You can define which column from BigQuery to use as an index in the
2029
destination DataFrame as well as a preferred column order as follows:
2130

2231
.. code-block:: python
2332
24-
data_frame = read_gbq('SELECT * FROM test_dataset.test_table',
25-
index_col='index_column_name',
26-
col_order=['col1', 'col2', 'col3'], projectid)
33+
data_frame = pandas_gbq.read_gbq(
34+
'SELECT * FROM `test_dataset.test_table`',
35+
project_id=projectid,
36+
index_col='index_column_name',
37+
col_order=['col1', 'col2', 'col3'])
2738
2839
2940
You can specify the query config as parameter to use additional options of
@@ -37,20 +48,39 @@ your job. For more information about query configuration parameters see `here
3748
"useQueryCache": False
3849
}
3950
}
40-
data_frame = read_gbq('SELECT * FROM test_dataset.test_table',
41-
configuration=configuration, projectid)
51+
data_frame = read_gbq(
52+
'SELECT * FROM `test_dataset.test_table`',
53+
project_id=projectid,
54+
configuration=configuration)
4255
4356
44-
.. note::
57+
The ``dialect`` argument can be used to indicate whether to use
58+
BigQuery's ``'legacy'`` SQL or BigQuery's ``'standard'`` SQL (beta). The
59+
default value is ``'standard'`` For more information on BigQuery's standard
60+
SQL, see `BigQuery SQL Reference
61+
<https://cloud.google.com/bigquery/docs/reference/standard-sql/>`__
4562

46-
You can find your project id in the `Google developers console
47-
<https://console.developers.google.com>`__.
63+
.. code-block:: python
4864
65+
data_frame = pandas_gbq.read_gbq(
66+
'SELECT * FROM [test_dataset.test_table]',
67+
project_id=projectid,
68+
dialect='legacy')
4969
50-
.. note::
5170
52-
The ``dialect`` argument can be used to indicate whether to use BigQuery's ``'legacy'`` SQL
53-
or BigQuery's ``'standard'`` SQL (beta). The default value is ``'legacy'``, though this will change
54-
in a subsequent release to ``'standard'``. For more information
55-
on BigQuery's standard SQL, see `BigQuery SQL Reference
56-
<https://cloud.google.com/bigquery/sql-reference/>`__
71+
.. _reading-dtypes:
72+
73+
Inferring the DataFrame's dtypes
74+
--------------------------------
75+
76+
The :func:`~pandas_gbq.read_gbq` method infers the pandas dtype for each column, based on the BigQuery table schema.
77+
78+
================== =========================
79+
BigQuery Data Type dtype
80+
================== =========================
81+
FLOAT float
82+
TIMESTAMP DatetimeTZDtype(unit='ns', tz='UTC')
83+
DATETIME datetime64[ns]
84+
TIME datetime64[ns]
85+
DATE datetime64[ns]
86+
================== =========================

pandas_gbq/gbq.py

+14-11
Original file line numberDiff line numberDiff line change
@@ -644,21 +644,24 @@ def delete_and_recreate_table(self, dataset_id, table_id, table_schema):
644644

645645

646646
def _bqschema_to_nullsafe_dtypes(schema_fields):
647-
# Only specify dtype when the dtype allows nulls. Otherwise, use pandas's
648-
# default dtype choice.
649-
#
650-
# See:
651-
# http://pandas.pydata.org/pandas-docs/dev/missing_data.html
652-
# #missing-data-casting-rules-and-indexing
647+
"""Specify explicit dtypes based on BigQuery schema.
648+
649+
This function only specifies a dtype when the dtype allows nulls.
650+
Otherwise, use pandas's default dtype choice.
651+
652+
See: http://pandas.pydata.org/pandas-docs/dev/missing_data.html
653+
#missing-data-casting-rules-and-indexing
654+
"""
655+
import pandas
656+
657+
# If you update this mapping, also update the table at
658+
# `docs/source/reading.rst`.
653659
dtype_map = {
654660
"FLOAT": np.dtype(float),
655-
# Even though TIMESTAMPs are timezone-aware in BigQuery, pandas doesn't
656-
# support datetime64[ns, UTC] as dtype in DataFrame constructors. See:
657-
# https://github.com/pandas-dev/pandas/issues/12513
658-
"TIMESTAMP": "datetime64[ns]",
661+
"TIMESTAMP": pandas.DatetimeTZDtype(tz="UTC"),
662+
"DATETIME": "datetime64[ns]",
659663
"TIME": "datetime64[ns]",
660664
"DATE": "datetime64[ns]",
661-
"DATETIME": "datetime64[ns]",
662665
}
663666

664667
dtypes = {}

tests/system/test_gbq.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -368,7 +368,7 @@ def test_should_properly_handle_arbitrary_datetime(self, project_id):
368368
"expression, is_expected_dtype",
369369
[
370370
("current_date()", pandas.api.types.is_datetime64_ns_dtype),
371-
("current_timestamp()", pandas.api.types.is_datetime64_ns_dtype),
371+
("current_timestamp()", pandas.api.types.is_datetime64tz_dtype),
372372
("current_datetime()", pandas.api.types.is_datetime64_ns_dtype),
373373
("TRUE", pandas.api.types.is_bool_dtype),
374374
("FALSE", pandas.api.types.is_bool_dtype),

0 commit comments

Comments
 (0)