Skip to content

Commit b0d1b7e

Browse files
msgpack: support tzindex in datetime
Support non-zero tzindex in datetime extended type. If both tzoffset and tzindex are specified, tzindex is prior (same as in Tarantool [1]). pytz [2] is used to build timezone info. Tarantool index to Olson name map and inverted one are built with gen_timezones.sh script based on tarantool/go-tarantool script [3]. All Tarantool unique and alias timezones presents in pytz.all_timezones list. Only the following abrreviated timezones from Tarantool presents in pytz.all_timezones (version 2022.2.1): - CET - EET - EST - GMT - HST - MST - UTC - WET pytz does not natively support work with abbreviated timezones due to its possibly ambiguous nature [4-6]. Tarantool itself do not support work with ambiguous abbreviated timezones: ``` Tarantool 2.10.1-0-g482d91c66 tarantool> datetime.new({tz = 'BST'}) --- - error: 'builtin/datetime.lua:477: could not parse ''BST'' - ambiguous timezone' ... ``` If ambiguous timezone is specified, the exception is raised. Tarantool header timezones.h [7] provides a map for all abbreviated timezones with category info (all ambiguous timezones are marked with TZ_AMBIGUOUS flag) and offset info. We parse this info to build pytz.fixedOffset() timezone for each Tarantool abbreviated timezone not supported natively by pytz. Since we explicitly store tarantool_tzindex, no info is lost on msgpack convert. Tarantool does not know of the following pytz version 2022.2.1 timezones: - CST6CDT - EST5EDT - Etc/GMT+1 - Etc/GMT+10 - Etc/GMT+11 - Etc/GMT+12 - Etc/GMT+2 - Etc/GMT+3 - Etc/GMT+4 - Etc/GMT+5 - Etc/GMT+6 - Etc/GMT+7 - Etc/GMT+8 - Etc/GMT+9 - Etc/GMT-1 - Etc/GMT-10 - Etc/GMT-11 - Etc/GMT-12 - Etc/GMT-13 - Etc/GMT-14 - Etc/GMT-2 - Etc/GMT-3 - Etc/GMT-4 - Etc/GMT-5 - Etc/GMT-6 - Etc/GMT-7 - Etc/GMT-8 - Etc/GMT-9 - Europe/Kyiv - MET - MST7MDT - PST8PDT It is some utility timezones or new synonyms. For each timezone not supported by Tarantool, we use tzoffset data from pytz object info instead. The warning is raised in this case. 1. https://www.tarantool.io/en/doc/latest/reference/reference_lua/datetime/new/ 2. https://pypi.org/project/pytz/ 3. https://github.com/tarantool/go-tarantool/blob/5801dc6f5ce69db7c8bc0c0d0fe4fb6042d5ecbc/datetime/gen-timezones.sh 4. https://stackoverflow.com/questions/37109945/how-to-use-abbreviated-timezone-namepst-ist-in-pytz 5. https://stackoverflow.com/questions/27531718/datetime-timezone-conversion-using-pytz 6. https://stackoverflow.com/questions/30315485/pytz-return-olson-timezone-name-from-only-a-gmt-offset 7. https://github.com/tarantool/tarantool/9ee45289e01232b8df1413efea11db170ae3b3b4/src/lib/tzcode/timezones.h
1 parent a3ead61 commit b0d1b7e

File tree

7 files changed

+2210
-13
lines changed

7 files changed

+2210
-13
lines changed

CHANGELOG.md

+1
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
1111
- UUID type support (#202).
1212
- Datetime type support and tarantool.Datetime type (#204).
1313
- Offset in datetime type support (#204).
14+
- Timezone in datetime type support (#204).
1415

1516
### Changed
1617
- Bump msgpack requirement to 1.0.4 (PR #223).

tarantool/msgpack_ext/types/datetime.py

+146-13
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,12 @@
11
import pandas
22
import pytz
33

4+
import tarantool.msgpack_ext.types.timezones as tt_timezones
5+
from tarantool.error import MsgpackError, MsgpackWarning, warn
6+
47
# https://www.tarantool.io/en/doc/latest/dev_guide/internals/msgpack_extensions/#the-datetime-type
58
#
9+
#
610
# The datetime MessagePack representation looks like this:
711
# +---------+----------------+==========+-----------------+
812
# | MP_EXT | MP_DATETIME | seconds | nsec; tzoffset; |
@@ -43,13 +47,6 @@
4347
SEC_IN_MIN = 60
4448
MIN_IN_DAY = 60 * 24
4549

46-
def compute_offset(dt):
47-
if dt.tz is None:
48-
return 0
49-
50-
utc_offset = dt.tz.utcoffset(dt)
51-
# There is no precision loss since pytz.FixedOffset is in minutes
52-
return utc_offset.days * MIN_IN_DAY + utc_offset.seconds // SEC_IN_MIN
5350

5451
def get_bytes_as_int(data, cursor, size):
5552
part = data[cursor:cursor + size]
@@ -58,6 +55,125 @@ def get_bytes_as_int(data, cursor, size):
5855
def get_int_as_bytes(data, size):
5956
return data.to_bytes(size, byteorder=BYTEORDER, signed=True)
6057

58+
def compute_offset(dt):
59+
utc_offset = dt.tz.utcoffset(dt)
60+
# There is no precision loss since pytz.FixedOffset is in minutes
61+
return utc_offset.days * MIN_IN_DAY + utc_offset.seconds // SEC_IN_MIN
62+
63+
def get_tz_as_offset(dt, tarantool_tz=None):
64+
tzoffset = compute_offset(dt)
65+
tzindex = 0
66+
if tarantool_tz is not None:
67+
tzindex = tt_timezones.timezoneToIndex[tarantool_tz]
68+
return tzoffset, tzindex
69+
70+
def get_tarantool_timezone(dt, tarantool_tz=None):
71+
# Tarantool 2.10 (commit 9ee45289e01232b8df1413efea11db170ae3b3b4)
72+
# do not support the following pytz (version 2022.2.1) timezones
73+
# - CST6CDT
74+
# - EST5EDT
75+
# - Etc/GMT+1
76+
# - Etc/GMT+10
77+
# - Etc/GMT+11
78+
# - Etc/GMT+12
79+
# - Etc/GMT+2
80+
# - Etc/GMT+3
81+
# - Etc/GMT+4
82+
# - Etc/GMT+5
83+
# - Etc/GMT+6
84+
# - Etc/GMT+7
85+
# - Etc/GMT+8
86+
# - Etc/GMT+9
87+
# - Etc/GMT-1
88+
# - Etc/GMT-10
89+
# - Etc/GMT-11
90+
# - Etc/GMT-12
91+
# - Etc/GMT-13
92+
# - Etc/GMT-14
93+
# - Etc/GMT-2
94+
# - Etc/GMT-3
95+
# - Etc/GMT-4
96+
# - Etc/GMT-5
97+
# - Etc/GMT-6
98+
# - Etc/GMT-7
99+
# - Etc/GMT-8
100+
# - Etc/GMT-9
101+
# - Europe/Kyiv
102+
# - MET
103+
# - MST7MDT
104+
# - PST8PDT
105+
#
106+
# They are transformed to tzoffset based on pytz info.
107+
tzoffset = compute_offset(dt)
108+
109+
# Abbreviated Tarantool timezones with zero offset are treated as
110+
# UTC-zone timestamps.
111+
if tarantool_tz is not None:
112+
tzindex = tt_timezones.timezoneToIndex[tarantool_tz]
113+
else:
114+
if dt.tz.zone in tt_timezones.timezoneToIndex:
115+
tzindex = tt_timezones.timezoneToIndex[dt.tz.zone]
116+
else:
117+
warn(f'pytz timezone {dt.tz} is not supported by Tarantool, '
118+
f'using tzoffset={tzoffset} instead', MsgpackWarning)
119+
120+
tzindex = 0
121+
122+
return tzoffset, tzindex
123+
124+
def get_tarantool_tz_data(dt, tarantool_tz=None):
125+
if dt.tz is None:
126+
return 0, 0
127+
128+
if dt.tz.zone is not None:
129+
return get_tarantool_timezone(dt, tarantool_tz)
130+
else:
131+
return get_tz_as_offset(dt, tarantool_tz)
132+
133+
def is_ambiguous_tz(tt_tzinfo):
134+
return (tt_tzinfo['category'] & tt_timezones.TZ_AMBIGUOUS) != 0
135+
136+
def get_pytz_timezone(tzindex=None, tzname=None):
137+
# https://raw.githubusercontent.com/tarantool/tarantool/9ee45289e01232b8df1413efea11db170ae3b3b4/src/lib/tzcode/timezones.h
138+
#
139+
# There are several possible timezone types in Tarantool.
140+
# Abbreviated timezones are a bit tricky since they could be ambiguous.
141+
# Tarantool itself do not support creating datetime with ambiguous timezones:
142+
#
143+
# Tarantool 2.10.1-0-g482d91c66
144+
#
145+
# tarantool> datetime.new({tz = 'BST'})
146+
# ---
147+
# - error: 'builtin/datetime.lua:477: could not parse ''BST'' - ambiguous timezone'
148+
# ...
149+
#
150+
# pytz version 2022.2.1 do not support most of Tarantool abbreviated timezones
151+
# (except for CET, EET EST, GMT, HST, MST, UTC, WET). Since Tarantool sources
152+
# provide offset info for abbreviated timezones, we use pytz.FixedOffset instead.
153+
#
154+
# https://stackoverflow.com/questions/30315485/pytz-return-olson-timezone-name-from-only-a-gmt-offset
155+
if tzname is not None:
156+
if tzname not in tt_timezones.timezoneToIndex:
157+
raise ValueError(f'Unknown Tarantool timezone "{tzname}"')
158+
elif tzindex is not None:
159+
if tzindex not in tt_timezones.indexToTimezone:
160+
raise MsgpackError(f'Unknown tzindex {tzindex}')
161+
tzname = tt_timezones.indexToTimezone[tzindex]
162+
else:
163+
raise ValueError('Pass tzindex or tzname')
164+
165+
try:
166+
tzinfo = pytz.timezone(tzname)
167+
except pytz.exceptions.UnknownTimeZoneError:
168+
tt_tzinfo = tt_timezones.timezoneAbbrevInfo[tzname]
169+
170+
if is_ambiguous_tz(tt_tzinfo):
171+
raise MsgpackError(f'Failed to decode datetime {tzname} with ambiguous timezone')
172+
173+
tzinfo = pytz.FixedOffset(tt_tzinfo['offset'])
174+
175+
return tzinfo
176+
61177
def msgpack_decode(data):
62178
cursor = 0
63179
seconds, cursor = get_bytes_as_int(data, cursor, SECONDS_SIZE_BYTES)
@@ -74,7 +190,8 @@ def msgpack_decode(data):
74190
total_nsec = seconds * NSEC_IN_SEC + nsec
75191

76192
if (tzindex != 0):
77-
raise NotImplementedError
193+
tzinfo = get_pytz_timezone(tzindex=tzindex)
194+
dt = pandas.to_datetime(total_nsec, unit='ns').replace(tzinfo=pytz.utc).tz_convert(tzinfo)
78195
elif (tzoffset != 0):
79196
tzinfo = pytz.FixedOffset(tzoffset)
80197
dt = pandas.to_datetime(total_nsec, unit='ns').replace(tzinfo=pytz.utc).tz_convert(tzinfo)
@@ -85,33 +202,39 @@ def msgpack_decode(data):
85202
return dt, tzoffset, tzindex
86203

87204
class Datetime(pandas.Timestamp):
88-
def __new__(cls, *args, **kwargs):
205+
def __new__(cls, *args, tarantool_tz=None, **kwargs):
89206
dt = None
90207
if len(args) > 0:
91208
if isinstance(args[0], bytes):
92209
dt, tzoffset, tzindex = msgpack_decode(args[0])
93210
elif isinstance(args[0], Datetime):
94211
dt = pandas.Timestamp.__new__(cls, *args, **kwargs)
95212
tzoffset = args[0].tarantool_tzoffset
213+
tzindex = args[0].tarantool_tzindex
96214

97215
if dt is None:
98216
dt = super().__new__(cls, *args, **kwargs)
99-
tzoffset = compute_offset(dt)
217+
tzoffset, tzindex = get_tarantool_tz_data(dt)
218+
219+
if tarantool_tz is not None:
220+
tzinfo = get_pytz_timezone(tzname=tarantool_tz)
221+
dt = pandas.Timestamp.replace(dt, tzinfo=tzinfo)
222+
tzoffset, tzindex = get_tarantool_tz_data(dt, tarantool_tz)
100223

101224
dt.__class__ = cls
102225
dt.tarantool_tzoffset = tzoffset
226+
dt.tarantool_tzindex = tzindex
103227
return dt
104228

105229
def msgpack_encode(self):
106230
seconds = self.value // NSEC_IN_SEC
107231
nsec = self.value % NSEC_IN_SEC
108-
tzoffset = 0
109-
tzindex = 0
110232

111233
if isinstance(self, Datetime):
112234
tzoffset = self.tarantool_tzoffset
235+
tzindex = self.tarantool_tzindex
113236
else:
114-
tzoffset = compute_offset(self)
237+
tzoffset, tzindex = get_tarantool_tz_data(self)
115238

116239
buf = get_int_as_bytes(seconds, SECONDS_SIZE_BYTES)
117240

@@ -137,3 +260,13 @@ def tz_convert(self, *args, **kwargs):
137260
def tz_localize(self, *args, **kwargs):
138261
dt = super().tz_localize(*args, **kwargs)
139262
return Datetime(dt)
263+
264+
def tarantool_tz_convert(self, tarantool_tz):
265+
tzinfo = get_pytz_timezone(tzname=tarantool_tz)
266+
dt = super().tz_convert(tzinfo)
267+
return Datetime(dt, tarantool_tz=tarantool_tz)
268+
269+
def tarantool_tz_localize(self, tarantool_tz):
270+
tzinfo = get_pytz_timezone(tzname=tarantool_tz)
271+
dt = super().tz_localize(tzinfo)
272+
return Datetime(dt, tarantool_tz=tarantool_tz)
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
from tarantool.msgpack_ext.types.timezones.timezones import (
2+
TZ_AMBIGUOUS,
3+
indexToTimezone,
4+
timezoneToIndex,
5+
timezoneAbbrevInfo,
6+
)
7+
8+
__all__ = ['TZ_AMBIGUOUS', 'indexToTimezone', 'timezoneToIndex',
9+
'timezoneAbbrevInfo']
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,69 @@
1+
#!/usr/bin/env bash
2+
set -xeuo pipefail
3+
4+
SRC_COMMIT="9ee45289e01232b8df1413efea11db170ae3b3b4"
5+
SRC_FILE=timezones.h
6+
DST_FILE=timezones.py
7+
8+
[ -e ${SRC_FILE} ] && rm ${SRC_FILE}
9+
wget -O ${SRC_FILE} \
10+
https://raw.githubusercontent.com/tarantool/tarantool/${SRC_COMMIT}/src/lib/tzcode/timezones.h
11+
12+
# We don't need aliases in indexToTimezone because Tarantool always replace it:
13+
#
14+
# tarantool> T = date.parse '2022-01-01T00:00 Pacific/Enderbury'
15+
# ---
16+
# ...
17+
# tarantool> T
18+
# ---
19+
# - 2022-01-01T00:00:00 Pacific/Kanton
20+
# ...
21+
#
22+
# So we can do the same and don't worry, be happy.
23+
24+
cat <<EOF > ${DST_FILE}
25+
# Automatically generated by gen-timezones.sh
26+
27+
TZ_UTC = 0x01
28+
TZ_RFC = 0x02
29+
TZ_MILITARY = 0x04
30+
TZ_AMBIGUOUS = 0x08
31+
TZ_NYI = 0x10
32+
TZ_OLSON = 0x20
33+
TZ_ALIAS = 0x40
34+
TZ_DST = 0x80
35+
36+
indexToTimezone = {
37+
EOF
38+
39+
grep ZONE_ABBREV ${SRC_FILE} | sed "s/ZONE_ABBREV( *//g" | sed "s/[),]//g" \
40+
| awk '{printf("\t%s : %s,\n", $1, $3)}' >> ${DST_FILE}
41+
grep ZONE_UNIQUE ${SRC_FILE} | sed "s/ZONE_UNIQUE( *//g" | sed "s/[),]//g" \
42+
| awk '{printf("\t%s : %s,\n", $1, $2)}' >> ${DST_FILE}
43+
44+
cat <<EOF >> ${DST_FILE}
45+
}
46+
47+
timezoneToIndex = {
48+
EOF
49+
50+
grep ZONE_ABBREV ${SRC_FILE} | sed "s/ZONE_ABBREV( *//g" | sed "s/[),]//g" \
51+
| awk '{printf("\t%s : %s,\n", $3, $1)}' >> ${DST_FILE}
52+
grep ZONE_UNIQUE ${SRC_FILE} | sed "s/ZONE_UNIQUE( *//g" | sed "s/[),]//g" \
53+
| awk '{printf("\t%s : %s,\n", $2, $1)}' >> ${DST_FILE}
54+
grep ZONE_ALIAS ${SRC_FILE} | sed "s/ZONE_ALIAS( *//g" | sed "s/[),]//g" \
55+
| awk '{printf("\t%s : %s,\n", $2, $1)}' >> ${DST_FILE}
56+
57+
cat <<EOF >> ${DST_FILE}
58+
}
59+
60+
timezoneAbbrevInfo = {
61+
EOF
62+
63+
grep ZONE_ABBREV ${SRC_FILE} | sed "s/ZONE_ABBREV( *//g" | sed "s/[),]//g" \
64+
| awk '{printf("\t%s : {\"offset\" : %d, \"category\" : %s},\n", $3, $2, $4)}' >> ${DST_FILE}
65+
echo "}" >> ${DST_FILE}
66+
67+
rm timezones.h
68+
69+
python validate_timezones.py

0 commit comments

Comments
 (0)