Skip to content

Commit 8986e3c

Browse files
mroeschkePingviinituutti
authored andcommitted
ENH: Support writing timestamps with timezones with to_sql (pandas-dev#22654)
1 parent b26d165 commit 8986e3c

File tree

5 files changed

+113
-15
lines changed

5 files changed

+113
-15
lines changed

doc/source/io.rst

+30
Original file line numberDiff line numberDiff line change
@@ -4806,6 +4806,36 @@ default ``Text`` type for string columns:
48064806
Because of this, reading the database table back in does **not** generate
48074807
a categorical.
48084808

4809+
.. _io.sql_datetime_data:
4810+
4811+
Datetime data types
4812+
'''''''''''''''''''
4813+
4814+
Using SQLAlchemy, :func:`~pandas.DataFrame.to_sql` is capable of writing
4815+
datetime data that is timezone naive or timezone aware. However, the resulting
4816+
data stored in the database ultimately depends on the supported data type
4817+
for datetime data of the database system being used.
4818+
4819+
The following table lists supported data types for datetime data for some
4820+
common databases. Other database dialects may have different data types for
4821+
datetime data.
4822+
4823+
=========== ============================================= ===================
4824+
Database SQL Datetime Types Timezone Support
4825+
=========== ============================================= ===================
4826+
SQLite ``TEXT`` No
4827+
MySQL ``TIMESTAMP`` or ``DATETIME`` No
4828+
PostgreSQL ``TIMESTAMP`` or ``TIMESTAMP WITH TIME ZONE`` Yes
4829+
=========== ============================================= ===================
4830+
4831+
When writing timezone aware data to databases that do not support timezones,
4832+
the data will be written as timezone naive timestamps that are in local time
4833+
with respect to the timezone.
4834+
4835+
:func:`~pandas.read_sql_table` is also capable of reading datetime data that is
4836+
timezone aware or naive. When reading ``TIMESTAMP WITH TIME ZONE`` types, pandas
4837+
will convert the data to UTC.
4838+
48094839
Reading Tables
48104840
''''''''''''''
48114841

doc/source/whatsnew/v0.24.0.txt

+4
Original file line numberDiff line numberDiff line change
@@ -222,6 +222,7 @@ Other Enhancements
222222
- :class:`IntervalIndex` has gained the :meth:`~IntervalIndex.set_closed` method to change the existing ``closed`` value (:issue:`21670`)
223223
- :func:`~DataFrame.to_csv`, :func:`~Series.to_csv`, :func:`~DataFrame.to_json`, and :func:`~Series.to_json` now support ``compression='infer'`` to infer compression based on filename extension (:issue:`15008`).
224224
The default compression for ``to_csv``, ``to_json``, and ``to_pickle`` methods has been updated to ``'infer'`` (:issue:`22004`).
225+
- :meth:`DataFrame.to_sql` now supports writing ``TIMESTAMP WITH TIME ZONE`` types for supported databases. For databases that don't support timezones, datetime data will be stored as timezone unaware local timestamps. See the :ref:`io.sql_datetime_data` for implications (:issue:`9086`).
225226
- :func:`to_timedelta` now supports iso-formated timedelta strings (:issue:`21877`)
226227
- :class:`Series` and :class:`DataFrame` now support :class:`Iterable` in constructor (:issue:`2193`)
227228
- :class:`DatetimeIndex` gained :attr:`DatetimeIndex.timetz` attribute. Returns local time with timezone information. (:issue:`21358`)
@@ -1246,6 +1247,9 @@ MultiIndex
12461247
I/O
12471248
^^^
12481249

1250+
- Bug in :meth:`to_sql` when writing timezone aware data (``datetime64[ns, tz]`` dtype) would raise a ``TypeError`` (:issue:`9086`)
1251+
- Bug in :meth:`to_sql` where a naive DatetimeIndex would be written as ``TIMESTAMP WITH TIMEZONE`` type in supported databases, e.g. PostgreSQL (:issue:`23510`)
1252+
12491253
.. _whatsnew_0240.bug_fixes.nan_with_str_dtype:
12501254

12511255
Proper handling of `np.NaN` in a string data-typed column with the Python engine

pandas/core/generic.py

+9
Original file line numberDiff line numberDiff line change
@@ -2397,6 +2397,15 @@ def to_sql(self, name, con, schema=None, if_exists='fail', index=True,
23972397
--------
23982398
pandas.read_sql : read a DataFrame from a table
23992399
2400+
Notes
2401+
-----
2402+
Timezone aware datetime columns will be written as
2403+
``Timestamp with timezone`` type with SQLAlchemy if supported by the
2404+
database. Otherwise, the datetimes will be stored as timezone unaware
2405+
timestamps local to the original timezone.
2406+
2407+
.. versionadded:: 0.24.0
2408+
24002409
References
24012410
----------
24022411
.. [1] http://docs.sqlalchemy.org

pandas/io/sql.py

+26-14
Original file line numberDiff line numberDiff line change
@@ -592,12 +592,17 @@ def insert_data(self):
592592
data_list = [None] * ncols
593593
blocks = temp._data.blocks
594594

595-
for i in range(len(blocks)):
596-
b = blocks[i]
595+
for b in blocks:
597596
if b.is_datetime:
598-
# convert to microsecond resolution so this yields
599-
# datetime.datetime
600-
d = b.values.astype('M8[us]').astype(object)
597+
# return datetime.datetime objects
598+
if b.is_datetimetz:
599+
# GH 9086: Ensure we return datetimes with timezone info
600+
# Need to return 2-D data; DatetimeIndex is 1D
601+
d = b.values.to_pydatetime()
602+
d = np.expand_dims(d, axis=0)
603+
else:
604+
# convert to microsecond resolution for datetime.datetime
605+
d = b.values.astype('M8[us]').astype(object)
601606
else:
602607
d = np.array(b.get_values(), dtype=object)
603608

@@ -612,7 +617,7 @@ def insert_data(self):
612617
return column_names, data_list
613618

614619
def _execute_insert(self, conn, keys, data_iter):
615-
data = [{k: v for k, v in zip(keys, row)} for row in data_iter]
620+
data = [dict(zip(keys, row)) for row in data_iter]
616621
conn.execute(self.insert_statement(), data)
617622

618623
def insert(self, chunksize=None):
@@ -741,8 +746,9 @@ def _get_column_names_and_types(self, dtype_mapper):
741746
def _create_table_setup(self):
742747
from sqlalchemy import Table, Column, PrimaryKeyConstraint
743748

744-
column_names_and_types = \
745-
self._get_column_names_and_types(self._sqlalchemy_type)
749+
column_names_and_types = self._get_column_names_and_types(
750+
self._sqlalchemy_type
751+
)
746752

747753
columns = [Column(name, typ, index=is_index)
748754
for name, typ, is_index in column_names_and_types]
@@ -841,14 +847,19 @@ def _sqlalchemy_type(self, col):
841847

842848
from sqlalchemy.types import (BigInteger, Integer, Float,
843849
Text, Boolean,
844-
DateTime, Date, Time)
850+
DateTime, Date, Time, TIMESTAMP)
845851

846852
if col_type == 'datetime64' or col_type == 'datetime':
853+
# GH 9086: TIMESTAMP is the suggested type if the column contains
854+
# timezone information
847855
try:
848-
tz = col.tzinfo # noqa
849-
return DateTime(timezone=True)
856+
if col.dt.tz is not None:
857+
return TIMESTAMP(timezone=True)
850858
except AttributeError:
851-
return DateTime
859+
# The column is actually a DatetimeIndex
860+
if col.tz is not None:
861+
return TIMESTAMP(timezone=True)
862+
return DateTime
852863
if col_type == 'timedelta64':
853864
warnings.warn("the 'timedelta' type is not supported, and will be "
854865
"written as integer values (ns frequency) to the "
@@ -1275,8 +1286,9 @@ def _create_table_setup(self):
12751286
structure of a DataFrame. The first entry will be a CREATE TABLE
12761287
statement while the rest will be CREATE INDEX statements.
12771288
"""
1278-
column_names_and_types = \
1279-
self._get_column_names_and_types(self._sql_type_name)
1289+
column_names_and_types = self._get_column_names_and_types(
1290+
self._sql_type_name
1291+
)
12801292

12811293
pat = re.compile(r'\s+')
12821294
column_names = [col_name for col_name, _, _ in column_names_and_types]

pandas/tests/io/test_sql.py

+44-1
Original file line numberDiff line numberDiff line change
@@ -961,7 +961,8 @@ def test_sqlalchemy_type_mapping(self):
961961
utc=True)})
962962
db = sql.SQLDatabase(self.conn)
963963
table = sql.SQLTable("test_type", db, frame=df)
964-
assert isinstance(table.table.c['time'].type, sqltypes.DateTime)
964+
# GH 9086: TIMESTAMP is the suggested type for datetimes with timezones
965+
assert isinstance(table.table.c['time'].type, sqltypes.TIMESTAMP)
965966

966967
def test_database_uri_string(self):
967968

@@ -1361,9 +1362,51 @@ def check(col):
13611362
df = sql.read_sql_table("types_test_data", self.conn)
13621363
check(df.DateColWithTz)
13631364

1365+
def test_datetime_with_timezone_roundtrip(self):
1366+
# GH 9086
1367+
# Write datetimetz data to a db and read it back
1368+
# For dbs that support timestamps with timezones, should get back UTC
1369+
# otherwise naive data should be returned
1370+
expected = DataFrame({'A': date_range(
1371+
'2013-01-01 09:00:00', periods=3, tz='US/Pacific'
1372+
)})
1373+
expected.to_sql('test_datetime_tz', self.conn, index=False)
1374+
1375+
if self.flavor == 'postgresql':
1376+
# SQLAlchemy "timezones" (i.e. offsets) are coerced to UTC
1377+
expected['A'] = expected['A'].dt.tz_convert('UTC')
1378+
else:
1379+
# Otherwise, timestamps are returned as local, naive
1380+
expected['A'] = expected['A'].dt.tz_localize(None)
1381+
1382+
result = sql.read_sql_table('test_datetime_tz', self.conn)
1383+
tm.assert_frame_equal(result, expected)
1384+
1385+
result = sql.read_sql_query(
1386+
'SELECT * FROM test_datetime_tz', self.conn
1387+
)
1388+
if self.flavor == 'sqlite':
1389+
# read_sql_query does not return datetime type like read_sql_table
1390+
assert isinstance(result.loc[0, 'A'], string_types)
1391+
result['A'] = to_datetime(result['A'])
1392+
tm.assert_frame_equal(result, expected)
1393+
1394+
def test_naive_datetimeindex_roundtrip(self):
1395+
# GH 23510
1396+
# Ensure that a naive DatetimeIndex isn't converted to UTC
1397+
dates = date_range('2018-01-01', periods=5, freq='6H')
1398+
expected = DataFrame({'nums': range(5)}, index=dates)
1399+
expected.to_sql('foo_table', self.conn, index_label='info_date')
1400+
result = sql.read_sql_table('foo_table', self.conn,
1401+
index_col='info_date')
1402+
# result index with gain a name from a set_index operation; expected
1403+
tm.assert_frame_equal(result, expected, check_names=False)
1404+
13641405
def test_date_parsing(self):
13651406
# No Parsing
13661407
df = sql.read_sql_table("types_test_data", self.conn)
1408+
expected_type = object if self.flavor == 'sqlite' else np.datetime64
1409+
assert issubclass(df.DateCol.dtype.type, expected_type)
13671410

13681411
df = sql.read_sql_table("types_test_data", self.conn,
13691412
parse_dates=['DateCol'])

0 commit comments

Comments
 (0)