Skip to content

Single approach to ext types encode/decode #252

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Oct 27, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
49 changes: 49 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -136,6 +136,51 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

- Support iproto feature discovery (#206).

- Support pandas way to build datetime from timestamp (PR #252).

`timestamp_since_utc_epoch` is a parameter to set timestamp
convertion behavior for timezone-aware datetimes.

If ``False`` (default), behaves similar to Tarantool `datetime.new()`:

```python
>>> dt = tarantool.Datetime(timestamp=1640995200, timestamp_since_utc_epoch=False)
>>> dt
datetime: Timestamp('2022-01-01 00:00:00'), tz: ""
>>> dt.timestamp
1640995200.0
>>> dt = tarantool.Datetime(timestamp=1640995200, tz='Europe/Moscow',
... timestamp_since_utc_epoch=False)
>>> dt
datetime: Timestamp('2022-01-01 00:00:00+0300', tz='Europe/Moscow'), tz: "Europe/Moscow"
>>> dt.timestamp
1640984400.0
```

Thus, if ``False``, datetime is computed from timestamp
since epoch and then timezone is applied without any
convertion. In that case, `dt.timestamp` won't be equal to
initialization `timestamp` for all timezones with non-zero offset.

If ``True``, behaves similar to `pandas.Timestamp`:

```python
>>> dt = tarantool.Datetime(timestamp=1640995200, timestamp_since_utc_epoch=True)
>>> dt
datetime: Timestamp('2022-01-01 00:00:00'), tz: ""
>>> dt.timestamp
1640995200.0
>>> dt = tarantool.Datetime(timestamp=1640995200, tz='Europe/Moscow',
... timestamp_since_utc_epoch=True)
>>> dt
datetime: Timestamp('2022-01-01 03:00:00+0300', tz='Europe/Moscow'), tz: "Europe/Moscow"
>>> dt.timestamp
1640995200.0
```

Thus, if ``True``, datetime is computed in a way that `dt.timestamp` will
always be equal to initialization `timestamp`.

### Changed
- Bump msgpack requirement to 1.0.4 (PR #223).
The only reason of this bump is various vulnerability fixes,
Expand All @@ -144,6 +189,10 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
- Update API documentation strings (#67).
- Update documentation index, quick start and guide pages (#67).
- Use git version to set package version (#238).
- Extract tarantool.Datetime encode and decode to external
functions (PR #252).
- Extract tarantool.Interval encode and decode to external
functions (PR #252).

### Fixed
- Package build (#238).
Expand Down
146 changes: 140 additions & 6 deletions tarantool/msgpack_ext/datetime.py
Original file line number Diff line number Diff line change
@@ -1,18 +1,83 @@
"""
Tarantool `datetime`_ extension type support module.

Refer to :mod:`~tarantool.msgpack_ext.types.datetime`.
The datetime MessagePack representation looks like this:

.. _datetime: https://www.tarantool.io/en/doc/latest/dev_guide/internals/msgpack_extensions/#the-datetime-type
.. code-block:: text

+---------+----------------+==========+-----------------+
| MP_EXT | MP_DATETIME | seconds | nsec; tzoffset; |
| = d7/d8 | = 4 | | tzindex; |
+---------+----------------+==========+-----------------+

MessagePack data contains:

* Seconds (8 bytes) as an unencoded 64-bit signed integer stored in the
little-endian order.
* The optional fields (8 bytes), if any of them have a non-zero value.
The fields include nsec (4 bytes), tzoffset (2 bytes), and
tzindex (2 bytes) packed in the little-endian order.

``seconds`` is seconds since Epoch, where the epoch is the point where
the time starts, and is platform dependent. For Unix, the epoch is
January 1, 1970, 00:00:00 (UTC). Tarantool uses a ``double`` type, see a
structure definition in src/lib/core/datetime.h and reasons in
`datetime RFC`_.

``nsec`` is nanoseconds, fractional part of seconds. Tarantool uses
``int32_t``, see a definition in src/lib/core/datetime.h.

``tzoffset`` is timezone offset in minutes from UTC. Tarantool uses
``int16_t`` type, see a structure definition in src/lib/core/datetime.h.

``tzindex`` is Olson timezone id. Tarantool uses ``int16_t`` type, see
a structure definition in src/lib/core/datetime.h. If both
``tzoffset`` and ``tzindex`` are specified, ``tzindex`` has the
preference and the ``tzoffset`` value is ignored.

.. _datetime RFC: https://github.com/tarantool/tarantool/wiki/Datetime-internals#intervals-in-c
"""

from tarantool.msgpack_ext.types.datetime import Datetime
from tarantool.msgpack_ext.types.datetime import (
NSEC_IN_SEC,
SEC_IN_MIN,
Datetime,
)
import tarantool.msgpack_ext.types.timezones as tt_timezones

from tarantool.error import MsgpackError

EXT_ID = 4
"""
`datetime`_ type id.
"""

BYTEORDER = 'little'

SECONDS_SIZE_BYTES = 8
NSEC_SIZE_BYTES = 4
TZOFFSET_SIZE_BYTES = 2
TZINDEX_SIZE_BYTES = 2


def get_int_as_bytes(data, size):
"""
Get binary representation of integer value.

:param data: Integer value.
:type data: :obj:`int`

:param size: Integer size, in bytes.
:type size: :obj:`int`

:return: Encoded integer.
:rtype: :obj:`bytes`

:meta private:
"""

return data.to_bytes(size, byteorder=BYTEORDER, signed=True)

def encode(obj):
"""
Encode a datetime object.
Expand All @@ -26,7 +91,48 @@ def encode(obj):
:raise: :exc:`tarantool.Datetime.msgpack_encode` exceptions
"""

return obj.msgpack_encode()
seconds = obj.value // NSEC_IN_SEC
nsec = obj.nsec
tzoffset = obj.tzoffset

tz = obj.tz
if tz != '':
tzindex = tt_timezones.timezoneToIndex[tz]
else:
tzindex = 0

buf = get_int_as_bytes(seconds, SECONDS_SIZE_BYTES)

if (nsec != 0) or (tzoffset != 0) or (tzindex != 0):
buf = buf + get_int_as_bytes(nsec, NSEC_SIZE_BYTES)
buf = buf + get_int_as_bytes(tzoffset, TZOFFSET_SIZE_BYTES)
buf = buf + get_int_as_bytes(tzindex, TZINDEX_SIZE_BYTES)

return buf


def get_bytes_as_int(data, cursor, size):
"""
Get integer value from binary data.

:param data: MessagePack binary data.
:type data: :obj:`bytes`

:param cursor: Index after last parsed byte.
:type cursor: :obj:`int`

:param size: Integer size, in bytes.
:type size: :obj:`int`

:return: First value: parsed integer, second value: new cursor
position.
:rtype: first value: :obj:`int`, second value: :obj:`int`

:meta private:
"""

part = data[cursor:cursor + size]
return int.from_bytes(part, BYTEORDER, signed=True), cursor + size

def decode(data):
"""
Expand All @@ -38,7 +144,35 @@ def decode(data):
:return: Decoded datetime.
:rtype: :class:`tarantool.Datetime`

:raise: :exc:`tarantool.Datetime` exceptions
:raise: :exc:`~tarantool.error.MsgpackError`,
:exc:`tarantool.Datetime` exceptions
"""

return Datetime(data)
cursor = 0
seconds, cursor = get_bytes_as_int(data, cursor, SECONDS_SIZE_BYTES)

data_len = len(data)
if data_len == (SECONDS_SIZE_BYTES + NSEC_SIZE_BYTES + \
TZOFFSET_SIZE_BYTES + TZINDEX_SIZE_BYTES):
nsec, cursor = get_bytes_as_int(data, cursor, NSEC_SIZE_BYTES)
tzoffset, cursor = get_bytes_as_int(data, cursor, TZOFFSET_SIZE_BYTES)
tzindex, cursor = get_bytes_as_int(data, cursor, TZINDEX_SIZE_BYTES)
elif data_len == SECONDS_SIZE_BYTES:
nsec = 0
tzoffset = 0
tzindex = 0
else:
raise MsgpackError(f'Unexpected datetime payload length {data_len}')

if tzindex != 0:
if tzindex not in tt_timezones.indexToTimezone:
raise MsgpackError(f'Failed to decode datetime with unknown tzindex "{tzindex}"')
tz = tt_timezones.indexToTimezone[tzindex]
return Datetime(timestamp=seconds, nsec=nsec, tz=tz,
timestamp_since_utc_epoch=True)
elif tzoffset != 0:
return Datetime(timestamp=seconds, nsec=nsec, tzoffset=tzoffset,
timestamp_since_utc_epoch=True)
else:
return Datetime(timestamp=seconds, nsec=nsec,
timestamp_since_utc_epoch=True)
103 changes: 96 additions & 7 deletions tarantool/msgpack_ext/interval.py
Original file line number Diff line number Diff line change
@@ -1,12 +1,50 @@
"""
Tarantool `datetime.interval`_ extension type support module.

Refer to :mod:`~tarantool.msgpack_ext.types.interval`.
The interval MessagePack representation looks like this:

.. code-block:: text

+--------+-------------------------+-------------+----------------+
| MP_EXT | Size of packed interval | MP_INTERVAL | PackedInterval |
+--------+-------------------------+-------------+----------------+

Packed interval consists of:

* Packed number of non-zero fields.
* Packed non-null fields.

Each packed field has the following structure:

.. code-block:: text

+----------+=====================+
| field ID | field value |
+----------+=====================+

The number of defined (non-null) fields can be zero. In this case,
the packed interval will be encoded as integer 0.

List of the field IDs:

* 0 – year
* 1 – month
* 2 – week
* 3 – day
* 4 – hour
* 5 – minute
* 6 – second
* 7 – nanosecond
* 8 – adjust

.. _datetime.interval: https://www.tarantool.io/en/doc/latest/dev_guide/internals/msgpack_extensions/#the-interval-type
"""

from tarantool.msgpack_ext.types.interval import Interval
import msgpack

from tarantool.error import MsgpackError

from tarantool.msgpack_ext.types.interval import Interval, Adjust, id_map

EXT_ID = 6
"""
Expand All @@ -22,11 +60,25 @@ def encode(obj):

:return: Encoded interval.
:rtype: :obj:`bytes`

:raise: :exc:`tarantool.Interval.msgpack_encode` exceptions
"""

return obj.msgpack_encode()
buf = bytes()

count = 0
for field_id in id_map.keys():
field_name = id_map[field_id]
value = getattr(obj, field_name)

if field_name == 'adjust':
value = value.value

if value != 0:
buf = buf + msgpack.packb(field_id) + msgpack.packb(value)
count = count + 1

buf = msgpack.packb(count) + buf

return buf

def decode(data):
"""
Expand All @@ -38,7 +90,44 @@ def decode(data):
:return: Decoded interval.
:rtype: :class:`tarantool.Interval`

:raise: :exc:`tarantool.Interval` exceptions
:raise: :exc:`MsgpackError`
"""

return Interval(data)
# If MessagePack data does not contain a field value, it is zero.
# If built not from MessagePack data, set argument values later.
kwargs = {
'year': 0,
'month': 0,
'week': 0,
'day': 0,
'hour': 0,
'minute': 0,
'sec': 0,
'nsec': 0,
'adjust': Adjust(0),
}

if len(data) != 0:
# To create an unpacker is the only way to parse
# a sequence of values in Python msgpack module.
unpacker = msgpack.Unpacker()
unpacker.feed(data)
field_count = unpacker.unpack()
for _ in range(field_count):
field_id = unpacker.unpack()
value = unpacker.unpack()

if field_id not in id_map:
raise MsgpackError(f'Unknown interval field id {field_id}')

field_name = id_map[field_id]

if field_name == 'adjust':
try:
value = Adjust(value)
except ValueError as e:
raise MsgpackError(e)

kwargs[id_map[field_id]] = value

return Interval(**kwargs)
Loading