Skip to content

Commit 8234495

Browse files
msgpack: support tzindex in datetime
Support non-zero tzindex in datetime extended type. If both tzoffset and tzindex are specified, tzindex is prior (same as in Tarantool [1]). Tarantool index to Olson name map and inverted one are built with gen_timezones.sh script based on tarantool/go-tarantool script [2]. Both Tarantool and pytz [3] are based on Olson tz database, yet there are some differences in their list of supported timezones. If you are using `pytz.FixedOffset` or non-abbreviated timezone (like 'Europe/Moscow'), everything is alright. If you are using abbreviated timezone, this is where everithing may become tricky. pytz do not support most Tarantool abbreviated timezones (except for CET, EET, EST, GMT, HST, MST, UTC, WET). If Tarantool abbreviated timezone is not supported by pytz, we create a timestamp with corresponding `pytz.FixedOffset` instead. `tarantool.Datetime` stores `tzindex` info, so nothing will be lost on encoding/decoding. If you want to create a datetime with non-pytz abbreviated timezone in python, you may use `tarantool.Datetime(tarantool_tzindex=tzindex)` argument. If you convert `tarantool.Datetime` with non-pytz abbreviated timezone to `pandas.Timestamp`, `tzindex` data will be lost on conversion. You may call `tzindex()` getter of `tarantool.Datetime` to extract tzindex data to use it later. There are some pytz timezones not supported by Tarantool: CST6CDT, EST5EDT, MET, MST7MDT, PST8PDT, Europe/Kyiv and all Etc/GMT* timezones (except for Etc/GMT, Etc/GMT+0, Etc/GMT-0). They are treated as `pytz.FixedOffset` on encoding. The warning is is raised in this case. pytz does not natively support work with abbreviated timezones due to its possibly ambiguous nature [4-6]. Tarantool itself do not support work with ambiguous abbreviated timezones: ``` Tarantool 2.10.1-0-g482d91c66 tarantool> datetime.new({tz = 'BST'}) --- - error: 'builtin/datetime.lua:477: could not parse ''BST'' - ambiguous timezone' ... ``` If ambiguous timezone is specified, the exception is raised. Tarantool header timezones.h [7] provides a map for all abbreviated timezones with category info (all ambiguous timezones are marked with TZ_AMBIGUOUS flag) and offset info. We parse this info to build pytz.fixedOffset() timezone for each Tarantool abbreviated timezone not supported natively by pytz. 1. https://www.tarantool.io/en/doc/latest/reference/reference_lua/datetime/new/ 2. https://github.com/tarantool/go-tarantool/blob/5801dc6f5ce69db7c8bc0c0d0fe4fb6042d5ecbc/datetime/gen-timezones.sh 3. https://pypi.org/project/pytz/ 4. https://stackoverflow.com/questions/37109945/how-to-use-abbreviated-timezone-namepst-ist-in-pytz 5. https://stackoverflow.com/questions/27531718/datetime-timezone-conversion-using-pytz 6. https://stackoverflow.com/questions/30315485/pytz-return-olson-timezone-name-from-only-a-gmt-offset 7. https://github.com/tarantool/tarantool/9ee45289e01232b8df1413efea11db170ae3b3b4/src/lib/tzcode/timezones.h
1 parent 7561363 commit 8234495

File tree

7 files changed

+2144
-4
lines changed

7 files changed

+2144
-4
lines changed

CHANGELOG.md

+28
Original file line numberDiff line numberDiff line change
@@ -32,6 +32,34 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
3232
or `to_datetime()` converter.
3333

3434
- Offset in datetime type support (#204).
35+
- Timezone in datetime type support (#204).
36+
37+
Both Tarantool and pytz are based on Olson tz database, yet
38+
there are some differences in their list of supported
39+
timezones.
40+
41+
If you are using `pytz.FixedOffset` or non-abbreviated timezone
42+
(like 'Europe/Moscow'), everything is alright.
43+
44+
If you are using abbreviated timezone, this is where everithing
45+
may become tricky. pytz do not support most Tarantool
46+
abbreviated timezones (except for CET, EET, EST, GMT, HST, MST,
47+
UTC, WET). If Tarantool abbreviated timezone is not supported
48+
by pytz, we create a timestamp with corresponding
49+
`pytz.FixedOffset` instead. `tarantool.Datetime` stores `tzindex`
50+
info, so nothing will be lost on encoding/decoding. If you
51+
want to create a datetime with non-pytz abbreviated timezone
52+
in python, you may use `tarantool.Datetime(tarantool_tzindex=tzindex)`
53+
argument. If you convert `tarantool.Datetime` with non-pytz
54+
abbreviated timezone to `pandas.Timestamp`, `tzindex` data will
55+
be lost on conversion. You may call `tzindex()` getter of
56+
`tarantool.Datetime` to extract tzindex data to use it later.
57+
58+
There are some pytz timezones not supported by Tarantool:
59+
CST6CDT, EST5EDT, MET, MST7MDT, PST8PDT, Europe/Kyiv and
60+
all Etc/GMT* timezones (except for Etc/GMT, Etc/GMT+0, Etc/GMT-0).
61+
They are treated as `pytz.FixedOffset` on encoding. The warning is
62+
is raised in this case.
3563

3664
### Changed
3765
- Bump msgpack requirement to 1.0.4 (PR #223).

tarantool/msgpack_ext/types/datetime.py

+110-4
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,9 @@
33
import pandas
44
import pytz
55

6+
import tarantool.msgpack_ext.types.timezones as tt_timezones
7+
from tarantool.error import MsgpackError, MsgpackWarning, warn
8+
69
# https://www.tarantool.io/en/doc/latest/dev_guide/internals/msgpack_extensions/#the-datetime-type
710
#
811
# The datetime MessagePack representation looks like this:
@@ -61,6 +64,97 @@ def compute_offset(timestamp):
6164
# There is no precision loss since offset is in minutes
6265
return utc_offset.days * MIN_IN_DAY + utc_offset.seconds // SEC_IN_MIN
6366

67+
def pytz_to_tarantool(timestamp):
68+
# Tarantool 2.10 (commit 9ee45289e01232b8df1413efea11db170ae3b3b4)
69+
# do not support the following pytz (version 2022.2.1) timezones
70+
# - CST6CDT
71+
# - EST5EDT
72+
# - Etc/GMT+1
73+
# - Etc/GMT+10
74+
# - Etc/GMT+11
75+
# - Etc/GMT+12
76+
# - Etc/GMT+2
77+
# - Etc/GMT+3
78+
# - Etc/GMT+4
79+
# - Etc/GMT+5
80+
# - Etc/GMT+6
81+
# - Etc/GMT+7
82+
# - Etc/GMT+8
83+
# - Etc/GMT+9
84+
# - Etc/GMT-1
85+
# - Etc/GMT-10
86+
# - Etc/GMT-11
87+
# - Etc/GMT-12
88+
# - Etc/GMT-13
89+
# - Etc/GMT-14
90+
# - Etc/GMT-2
91+
# - Etc/GMT-3
92+
# - Etc/GMT-4
93+
# - Etc/GMT-5
94+
# - Etc/GMT-6
95+
# - Etc/GMT-7
96+
# - Etc/GMT-8
97+
# - Etc/GMT-9
98+
# - Europe/Kyiv
99+
# - MET
100+
# - MST7MDT
101+
# - PST8PDT
102+
#
103+
# They are transformed to tzoffset based on pytz info.
104+
tzoffset = compute_offset(timestamp)
105+
106+
if (timestamp.tz is None) or (timestamp.tz.zone is None):
107+
tzindex = 0
108+
else:
109+
if timestamp.tz.zone in tt_timezones.timezoneToIndex:
110+
tzindex = tt_timezones.timezoneToIndex[timestamp.tz.zone]
111+
else:
112+
warn(f'pytz timezone {timestamp.tz} is not supported by Tarantool, '
113+
f'using tzoffset={tzoffset} instead', MsgpackWarning)
114+
115+
tzindex = 0
116+
117+
return tzoffset, tzindex
118+
119+
def is_ambiguous_tz(tt_tzinfo):
120+
return (tt_tzinfo['category'] & tt_timezones.TZ_AMBIGUOUS) != 0
121+
122+
def tarantool_to_pytz(tzindex, error_class=ValueError):
123+
# https://raw.githubusercontent.com/tarantool/tarantool/9ee45289e01232b8df1413efea11db170ae3b3b4/src/lib/tzcode/timezones.h
124+
#
125+
# There are several possible timezone types in Tarantool.
126+
# Abbreviated timezones are a bit tricky since they could be ambiguous.
127+
# Tarantool itself do not support creating datetime with ambiguous timezones:
128+
#
129+
# Tarantool 2.10.1-0-g482d91c66
130+
#
131+
# tarantool> datetime.new({tz = 'BST'})
132+
# ---
133+
# - error: 'builtin/datetime.lua:477: could not parse ''BST'' - ambiguous timezone'
134+
# ...
135+
#
136+
# pytz version 2022.2.1 do not support most of Tarantool abbreviated timezones
137+
# (except for CET, EET EST, GMT, HST, MST, UTC, WET). Since Tarantool sources
138+
# provide offset info for abbreviated timezones, we use pytz.FixedOffset instead.
139+
#
140+
# https://stackoverflow.com/questions/30315485/pytz-return-olson-timezone-name-from-only-a-gmt-offset
141+
142+
if tzindex not in tt_timezones.indexToTimezone:
143+
raise error_class(f'Unknown tzindex {tzindex}')
144+
tzname = tt_timezones.indexToTimezone[tzindex]
145+
146+
try:
147+
tzinfo = pytz.timezone(tzname)
148+
except pytz.exceptions.UnknownTimeZoneError:
149+
tt_tzinfo = tt_timezones.timezoneAbbrevInfo[tzname]
150+
151+
if is_ambiguous_tz(tt_tzinfo):
152+
raise error_class(f'Failed to decode datetime {tzname} with ambiguous timezone')
153+
154+
tzinfo = pytz.FixedOffset(tt_tzinfo['offset'])
155+
156+
return tzinfo
157+
64158
def msgpack_decode(data):
65159
cursor = 0
66160
seconds, cursor = get_bytes_as_int(data, cursor, SECONDS_SIZE_BYTES)
@@ -77,7 +171,8 @@ def msgpack_decode(data):
77171
total_nsec = seconds * NSEC_IN_SEC + nsec
78172

79173
if (tzindex != 0):
80-
raise NotImplementedError
174+
tzinfo = tarantool_to_pytz(tzindex, error_class=MsgpackError)
175+
timestamp = pandas.to_datetime(total_nsec, unit='ns').replace(tzinfo=pytz.utc).tz_convert(tzinfo)
81176
elif (tzoffset != 0):
82177
tzinfo = pytz.FixedOffset(tzoffset)
83178
timestamp = pandas.to_datetime(total_nsec, unit='ns').replace(tzinfo=pytz.utc).tz_convert(tzinfo)
@@ -88,23 +183,31 @@ def msgpack_decode(data):
88183
return timestamp, tzoffset, tzindex
89184

90185
class Datetime():
91-
def __init__(self, *args, **kwargs):
186+
def __init__(self, *args, tarantool_tzindex=None, **kwargs):
92187
if len(args) > 0:
93188
data = args[0]
94189
if isinstance(data, bytes):
95190
timestamp, tzoffset, tzindex = msgpack_decode(data)
96191
elif isinstance(data, pandas.Timestamp):
97192
timestamp = deepcopy(data)
98-
tzoffset = compute_offset(timestamp)
193+
tzoffset, tzindex = pytz_to_tarantool(timestamp)
99194
elif isinstance(data, Datetime):
100195
timestamp = deepcopy(data._timestamp)
101196
tzoffset = deepcopy(data._tzoffset)
197+
tzindex = deepcopy(data._tzoffset)
102198
else:
103199
timestamp = pandas.Timestamp(*args, **kwargs)
200+
tzoffset, tzindex = pytz_to_tarantool(timestamp)
201+
202+
if tarantool_tzindex is not None:
203+
tzinfo = tarantool_to_pytz(tarantool_tzindex)
204+
timestamp = pandas.Timestamp.replace(timestamp, tzinfo=tzinfo)
104205
tzoffset = compute_offset(timestamp)
206+
tzindex = tarantool_tzindex
105207

106208
self._timestamp = timestamp
107209
self._tzoffset = tzoffset
210+
self._tzindex = tzindex
108211

109212
def __eq__(self, other):
110213
if isinstance(other, Datetime):
@@ -120,13 +223,16 @@ def to_pd_timestamp(self):
120223
def tzoffset(self):
121224
return deepcopy(self._tzoffset)
122225

226+
def tzindex(self):
227+
return deepcopy(self._tzindex)
228+
123229
def msgpack_encode(self):
124230
ts_value = self._timestamp.value
125231

126232
seconds = ts_value // NSEC_IN_SEC
127233
nsec = ts_value % NSEC_IN_SEC
128234
tzoffset = self._tzoffset
129-
tzindex = 0
235+
tzindex = self._tzindex
130236

131237
buf = get_int_as_bytes(seconds, SECONDS_SIZE_BYTES)
132238

Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
from tarantool.msgpack_ext.types.timezones.timezones import (
2+
TZ_AMBIGUOUS,
3+
indexToTimezone,
4+
timezoneToIndex,
5+
timezoneAbbrevInfo,
6+
)
7+
8+
__all__ = ['TZ_AMBIGUOUS', 'indexToTimezone', 'timezoneToIndex',
9+
'timezoneAbbrevInfo']
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,69 @@
1+
#!/usr/bin/env bash
2+
set -xeuo pipefail
3+
4+
SRC_COMMIT="9ee45289e01232b8df1413efea11db170ae3b3b4"
5+
SRC_FILE=timezones.h
6+
DST_FILE=timezones.py
7+
8+
[ -e ${SRC_FILE} ] && rm ${SRC_FILE}
9+
wget -O ${SRC_FILE} \
10+
https://raw.githubusercontent.com/tarantool/tarantool/${SRC_COMMIT}/src/lib/tzcode/timezones.h
11+
12+
# We don't need aliases in indexToTimezone because Tarantool always replace it:
13+
#
14+
# tarantool> T = date.parse '2022-01-01T00:00 Pacific/Enderbury'
15+
# ---
16+
# ...
17+
# tarantool> T
18+
# ---
19+
# - 2022-01-01T00:00:00 Pacific/Kanton
20+
# ...
21+
#
22+
# So we can do the same and don't worry, be happy.
23+
24+
cat <<EOF > ${DST_FILE}
25+
# Automatically generated by gen-timezones.sh
26+
27+
TZ_UTC = 0x01
28+
TZ_RFC = 0x02
29+
TZ_MILITARY = 0x04
30+
TZ_AMBIGUOUS = 0x08
31+
TZ_NYI = 0x10
32+
TZ_OLSON = 0x20
33+
TZ_ALIAS = 0x40
34+
TZ_DST = 0x80
35+
36+
indexToTimezone = {
37+
EOF
38+
39+
grep ZONE_ABBREV ${SRC_FILE} | sed "s/ZONE_ABBREV( *//g" | sed "s/[),]//g" \
40+
| awk '{printf("\t%s : %s,\n", $1, $3)}' >> ${DST_FILE}
41+
grep ZONE_UNIQUE ${SRC_FILE} | sed "s/ZONE_UNIQUE( *//g" | sed "s/[),]//g" \
42+
| awk '{printf("\t%s : %s,\n", $1, $2)}' >> ${DST_FILE}
43+
44+
cat <<EOF >> ${DST_FILE}
45+
}
46+
47+
timezoneToIndex = {
48+
EOF
49+
50+
grep ZONE_ABBREV ${SRC_FILE} | sed "s/ZONE_ABBREV( *//g" | sed "s/[),]//g" \
51+
| awk '{printf("\t%s : %s,\n", $3, $1)}' >> ${DST_FILE}
52+
grep ZONE_UNIQUE ${SRC_FILE} | sed "s/ZONE_UNIQUE( *//g" | sed "s/[),]//g" \
53+
| awk '{printf("\t%s : %s,\n", $2, $1)}' >> ${DST_FILE}
54+
grep ZONE_ALIAS ${SRC_FILE} | sed "s/ZONE_ALIAS( *//g" | sed "s/[),]//g" \
55+
| awk '{printf("\t%s : %s,\n", $2, $1)}' >> ${DST_FILE}
56+
57+
cat <<EOF >> ${DST_FILE}
58+
}
59+
60+
timezoneAbbrevInfo = {
61+
EOF
62+
63+
grep ZONE_ABBREV ${SRC_FILE} | sed "s/ZONE_ABBREV( *//g" | sed "s/[),]//g" \
64+
| awk '{printf("\t%s : {\"offset\" : %d, \"category\" : %s},\n", $3, $2, $4)}' >> ${DST_FILE}
65+
echo "}" >> ${DST_FILE}
66+
67+
rm timezones.h
68+
69+
python validate_timezones.py

0 commit comments

Comments
 (0)