Skip to content

serialization of numpy dtypes is inconsistent and non-informative #171

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
ssolari opened this issue Feb 13, 2016 · 3 comments
Closed

serialization of numpy dtypes is inconsistent and non-informative #171

ssolari opened this issue Feb 13, 2016 · 3 comments

Comments

@ssolari
Copy link

ssolari commented Feb 13, 2016

The quick issue is as follows. msgpack serializes some numpy dtypes, but raises an uninformative TypeError for others in different ways depending on python 2.7 or 3.4 (np.int64).

import msgpack
import numpy as np
msgpack.dumps(np.float64(64)) # works on 2.7 and 3.4
msgpack.dumps(np.float32(64)) # "TypeError: can't serialize 64" in 2.7 and 3.4
msgpack.dumps(np.int64(64)) # works on 2.7; "TypeError: can't serialize 64" in 3.4

Possible solutions:

  • Enforce consistency and raise a "numpy.dtype" TypeError for anything that is numpy dtype.
  • support at least float64 and int 64 across python versions. This is an intermediate solution, this would handle a good majority of use cases.
  • support all numpy dtypes. Maybe something to consider, but I imagine this may not be that easy.

The motivation for support of numpy. We are working with msgpack to serialize/access objects for storage in HDF5 with pytables. The combination is quite simple and powerful. It addresses a major need of very efficiently (size and speed) storing complex data to disk and back. And solves an issue with json which is that integers used as keys in dictionaries are deserialized correctly with msgpack. Pytables relies on ndarray containers from numpy to return data. The issue only arises then when combining data that has already been read from pytables and is then being packed to be written again, which lead to a tricky bug because of the inconsistent behavior.

@methane
Copy link
Member

methane commented Feb 14, 2016

This difference came from numpy and Python itself.
numpy's int64 may subclass of Python 2's int. The int type in Python 2 was gone away in Python 3 and
Python 2's long type is called int.
numpy's float64 is subclass of Python's float type and numpy's int64 is subclass of int type on Python 2.
numpy's float32 is not subclass of Python's float type and numpy's int64 is not subclass of int type on Python 3.

So difference behavior about np.int64 is not a issue of msgpack.

There is strict_types option in packer. When the option is enabled, packer raises TypeError for subtype of standard types. I'll make it on by default in the future and it will resolve inconsistent behavior.

@methane
Copy link
Member

methane commented Feb 14, 2016

I don't want to add native support of numpy dtypes.
Since msgpack is format like JSON, unpack(pack(np.int8(42)) will be Python's integer.

msgpack has default option and ExtType already. You can add numpy support by yourself.

@methane methane closed this as completed Feb 14, 2016
@ssolari
Copy link
Author

ssolari commented Feb 17, 2016

@methane thanks. I agree this is not an issue of msgpack and that python really made things difficult with all the changes from 2 to 3. Given that msgpack-python is however for use with python and numpy is increasingly ubiquitous, does it make sense to consider some kind of added feature? Maybe the following np.asscalar() workaround could be performed under the hood for numpy dtypes, or a flag added to do that? I believe this would create expected behavior.

As a work around, using np.asscalar() works, example:

msgpack.loads(msgpack.dumps(np.asscalar(np.float32(1))))

also relates to #150 #61

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants