Skip to content

Support datetime extended type #228

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Sep 26, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
57 changes: 57 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,63 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
### Added
- Decimal type support (#203).
- UUID type support (#202).
- Datetime type support and tarantool.Datetime type (#204).

Tarantool datetime objects are decoded to `tarantool.Datetime`
type. `tarantool.Datetime` may be encoded to Tarantool datetime
objects.

You can create `tarantool.Datetime` objects either from msgpack
data or by using the same API as in Tarantool:

```python
dt1 = tarantool.Datetime(year=2022, month=8, day=31,
hour=18, minute=7, sec=54,
nsec=308543321)

dt2 = tarantool.Datetime(timestamp=1661969274)

dt3 = tarantool.Datetime(timestamp=1661969274, nsec=308543321)
```

`tarantool.Datetime` exposes `year`, `month`, `day`, `hour`,
`minute`, `sec`, `nsec`, `timestamp` and `value` (integer epoch time
with nanoseconds precision) properties if you need to convert
`tarantool.Datetime` to any other kind of datetime object:

```python
pdt = pandas.Timestamp(year=dt.year, month=dt.month, day=dt.day,
hour=dt.hour, minute=dt.minute, second=dt.sec,
microsecond=(dt.nsec // 1000),
nanosecond=(dt.nsec % 1000))
```

- Offset in datetime type support (#204).

Use `tzoffset` parameter to set up offset timezone:

```python
dt = tarantool.Datetime(year=2022, month=8, day=31,
hour=18, minute=7, sec=54,
nsec=308543321, tzoffset=180)
```

You may use `tzoffset` property to get timezone offset of a datetime
object.

- Timezone in datetime type support (#204).

Use `tz` parameter to set up timezone name:

```python
dt = tarantool.Datetime(year=2022, month=8, day=31,
hour=18, minute=7, sec=54,
nsec=308543321, tz='Europe/Moscow')
```

If both `tz` and `tzoffset` is specified, `tz` is used.

You may use `tz` property to get timezone name of a datetime object.

### Changed
- Bump msgpack requirement to 1.0.4 (PR #223).
Expand Down
2 changes: 2 additions & 0 deletions requirements.txt
Original file line number Diff line number Diff line change
@@ -1 +1,3 @@
msgpack>=1.0.4
pandas
pytz
6 changes: 5 additions & 1 deletion tarantool/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,10 @@
ENCODING_DEFAULT,
)

from tarantool.msgpack_ext.types.datetime import (
Datetime,
)

__version__ = "0.9.0"


Expand Down Expand Up @@ -91,7 +95,7 @@ def connectmesh(addrs=({'host': 'localhost', 'port': 3301},), user=None,

__all__ = ['connect', 'Connection', 'connectmesh', 'MeshConnection', 'Schema',
'Error', 'DatabaseError', 'NetworkError', 'NetworkWarning',
'SchemaError', 'dbapi']
'SchemaError', 'dbapi', 'Datetime']

# ConnectionPool is supported only for Python 3.7 or newer.
if sys.version_info.major >= 3 and sys.version_info.minor >= 7:
Expand Down
9 changes: 9 additions & 0 deletions tarantool/msgpack_ext/datetime.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
from tarantool.msgpack_ext.types.datetime import Datetime

EXT_ID = 4

def encode(obj):
return obj.msgpack_encode()

def decode(data):
return Datetime(data)
8 changes: 6 additions & 2 deletions tarantool/msgpack_ext/packer.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,12 +2,16 @@
from uuid import UUID
from msgpack import ExtType

from tarantool.msgpack_ext.types.datetime import Datetime

import tarantool.msgpack_ext.decimal as ext_decimal
import tarantool.msgpack_ext.uuid as ext_uuid
import tarantool.msgpack_ext.datetime as ext_datetime

encoders = [
{'type': Decimal, 'ext': ext_decimal},
{'type': UUID, 'ext': ext_uuid },
{'type': Decimal, 'ext': ext_decimal },
{'type': UUID, 'ext': ext_uuid },
{'type': Datetime, 'ext': ext_datetime},
]

def default(obj):
Expand Down
264 changes: 264 additions & 0 deletions tarantool/msgpack_ext/types/datetime.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,264 @@
from copy import deepcopy

import pandas
import pytz

import tarantool.msgpack_ext.types.timezones as tt_timezones
from tarantool.error import MsgpackError

# https://www.tarantool.io/en/doc/latest/dev_guide/internals/msgpack_extensions/#the-datetime-type
#
# The datetime MessagePack representation looks like this:
# +---------+----------------+==========+-----------------+
# | MP_EXT | MP_DATETIME | seconds | nsec; tzoffset; |
# | = d7/d8 | = 4 | | tzindex; |
# +---------+----------------+==========+-----------------+
# MessagePack data contains:
#
# * Seconds (8 bytes) as an unencoded 64-bit signed integer stored in the
# little-endian order.
# * The optional fields (8 bytes), if any of them have a non-zero value.
# The fields include nsec (4 bytes), tzoffset (2 bytes), and
# tzindex (2 bytes) packed in the little-endian order.
#
# seconds is seconds since Epoch, where the epoch is the point where the time
# starts, and is platform dependent. For Unix, the epoch is January 1,
# 1970, 00:00:00 (UTC). Tarantool uses a double type, see a structure
# definition in src/lib/core/datetime.h and reasons in
# https://github.com/tarantool/tarantool/wiki/Datetime-internals#intervals-in-c
#
# nsec is nanoseconds, fractional part of seconds. Tarantool uses int32_t, see
# a definition in src/lib/core/datetime.h.
#
# tzoffset is timezone offset in minutes from UTC. Tarantool uses a int16_t type,
# see a structure definition in src/lib/core/datetime.h.
#
# tzindex is Olson timezone id. Tarantool uses a int16_t type, see a structure
# definition in src/lib/core/datetime.h. If both tzoffset and tzindex are
# specified, tzindex has the preference and the tzoffset value is ignored.

SECONDS_SIZE_BYTES = 8
NSEC_SIZE_BYTES = 4
TZOFFSET_SIZE_BYTES = 2
TZINDEX_SIZE_BYTES = 2

BYTEORDER = 'little'

NSEC_IN_SEC = 1000000000
NSEC_IN_MKSEC = 1000
SEC_IN_MIN = 60

def get_bytes_as_int(data, cursor, size):
part = data[cursor:cursor + size]
return int.from_bytes(part, BYTEORDER, signed=True), cursor + size

def get_int_as_bytes(data, size):
return data.to_bytes(size, byteorder=BYTEORDER, signed=True)

def compute_offset(timestamp):
utc_offset = timestamp.tzinfo.utcoffset(timestamp)

# `None` offset is a valid utcoffset implementation,
# but it seems that pytz timezones never return `None`:
# https://github.com/pandas-dev/pandas/issues/15986
assert utc_offset is not None

# There is no precision loss since offset is in minutes
return int(utc_offset.total_seconds()) // SEC_IN_MIN

def get_python_tzinfo(tz, error_class):
if tz in pytz.all_timezones:
return pytz.timezone(tz)

# Checked with timezones/validate_timezones.py
tt_tzinfo = tt_timezones.timezoneAbbrevInfo[tz]
if (tt_tzinfo['category'] & tt_timezones.TZ_AMBIGUOUS) != 0:
raise error_class(f'Failed to create datetime with ambiguous timezone "{tz}"')

return pytz.FixedOffset(tt_tzinfo['offset'])

def msgpack_decode(data):
cursor = 0
seconds, cursor = get_bytes_as_int(data, cursor, SECONDS_SIZE_BYTES)

data_len = len(data)
if data_len == (SECONDS_SIZE_BYTES + NSEC_SIZE_BYTES + \
TZOFFSET_SIZE_BYTES + TZINDEX_SIZE_BYTES):
nsec, cursor = get_bytes_as_int(data, cursor, NSEC_SIZE_BYTES)
tzoffset, cursor = get_bytes_as_int(data, cursor, TZOFFSET_SIZE_BYTES)
tzindex, cursor = get_bytes_as_int(data, cursor, TZINDEX_SIZE_BYTES)
elif data_len == SECONDS_SIZE_BYTES:
nsec = 0
tzoffset = 0
tzindex = 0
else:
raise MsgpackError(f'Unexpected datetime payload length {data_len}')

total_nsec = seconds * NSEC_IN_SEC + nsec
datetime = pandas.to_datetime(total_nsec, unit='ns')

if tzindex != 0:
if tzindex not in tt_timezones.indexToTimezone:
raise MsgpackError(f'Failed to decode datetime with unknown tzindex "{tzindex}"')
tz = tt_timezones.indexToTimezone[tzindex]
tzinfo = get_python_tzinfo(tz, MsgpackError)
return datetime.replace(tzinfo=pytz.UTC).tz_convert(tzinfo), tz
elif tzoffset != 0:
tzinfo = pytz.FixedOffset(tzoffset)
return datetime.replace(tzinfo=pytz.UTC).tz_convert(tzinfo), ''
else:
return datetime, ''

class Datetime():
def __init__(self, data=None, *, timestamp=None, year=None, month=None,
day=None, hour=None, minute=None, sec=None, nsec=None,
tzoffset=0, tz=''):
if data is not None:
if not isinstance(data, bytes):
raise ValueError('data argument (first positional argument) ' +
'expected to be a "bytes" instance')

datetime, tz = msgpack_decode(data)
self._datetime = datetime
self._tz = tz
return

# The logic is same as in Tarantool, refer to datetime API.
# https://www.tarantool.io/en/doc/latest/reference/reference_lua/datetime/new/
if timestamp is not None:
if ((year is not None) or (month is not None) or \
(day is not None) or (hour is not None) or \
(minute is not None) or (sec is not None)):
raise ValueError('Cannot provide both timestamp and year, month, ' +
'day, hour, minute, sec')

if nsec is not None:
if not isinstance(timestamp, int):
raise ValueError('timestamp must be int if nsec provided')

total_nsec = timestamp * NSEC_IN_SEC + nsec
datetime = pandas.to_datetime(total_nsec, unit='ns')
else:
datetime = pandas.to_datetime(timestamp, unit='s')
else:
if nsec is not None:
microsecond = nsec // NSEC_IN_MKSEC
nanosecond = nsec % NSEC_IN_MKSEC
else:
microsecond = 0
nanosecond = 0

datetime = pandas.Timestamp(year=year, month=month, day=day,
hour=hour, minute=minute, second=sec,
microsecond=microsecond,
nanosecond=nanosecond)

if tz != '':
if tz not in tt_timezones.timezoneToIndex:
raise ValueError(f'Unknown Tarantool timezone "{tz}"')

tzinfo = get_python_tzinfo(tz, ValueError)
self._datetime = datetime.replace(tzinfo=tzinfo)
self._tz = tz
elif tzoffset != 0:
tzinfo = pytz.FixedOffset(tzoffset)
self._datetime = datetime.replace(tzinfo=tzinfo)
self._tz = ''
else:
self._datetime = datetime
self._tz = ''

def __eq__(self, other):
if isinstance(other, Datetime):
return self._datetime == other._datetime
elif isinstance(other, pandas.Timestamp):
return self._datetime == other
else:
return False

def __str__(self):
return self._datetime.__str__()

def __repr__(self):
return f'datetime: {self._datetime.__repr__()}, tz: "{self.tz}"'

def __copy__(self):
cls = self.__class__
result = cls.__new__(cls)
result.__dict__.update(self.__dict__)
return result

def __deepcopy__(self, memo):
cls = self.__class__
result = cls.__new__(cls)
memo[id(self)] = result
for k, v in self.__dict__.items():
setattr(result, k, deepcopy(v, memo))
return result

@property
def year(self):
return self._datetime.year

@property
def month(self):
return self._datetime.month

@property
def day(self):
return self._datetime.day

@property
def hour(self):
return self._datetime.hour

@property
def minute(self):
return self._datetime.minute

@property
def sec(self):
return self._datetime.second

@property
def nsec(self):
# microseconds + nanoseconds
return self._datetime.value % NSEC_IN_SEC

@property
def timestamp(self):
return self._datetime.timestamp()

@property
def tzoffset(self):
if self._datetime.tzinfo is not None:
return compute_offset(self._datetime)
return 0

@property
def tz(self):
return self._tz

@property
def value(self):
return self._datetime.value

def msgpack_encode(self):
seconds = self.value // NSEC_IN_SEC
nsec = self.nsec
tzoffset = self.tzoffset

tz = self.tz
if tz != '':
tzindex = tt_timezones.timezoneToIndex[tz]
else:
tzindex = 0

buf = get_int_as_bytes(seconds, SECONDS_SIZE_BYTES)

if (nsec != 0) or (tzoffset != 0) or (tzindex != 0):
buf = buf + get_int_as_bytes(nsec, NSEC_SIZE_BYTES)
buf = buf + get_int_as_bytes(tzoffset, TZOFFSET_SIZE_BYTES)
buf = buf + get_int_as_bytes(tzindex, TZINDEX_SIZE_BYTES)

return buf
9 changes: 9 additions & 0 deletions tarantool/msgpack_ext/types/timezones/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
from tarantool.msgpack_ext.types.timezones.timezones import (
TZ_AMBIGUOUS,
indexToTimezone,
timezoneToIndex,
timezoneAbbrevInfo,
)

__all__ = ['TZ_AMBIGUOUS', 'indexToTimezone', 'timezoneToIndex',
'timezoneAbbrevInfo']
Loading