-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
ENH: support for msgpack serialization/deserialization #3831
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
latest commit cdc870d basically puts float64/int64 directly in the msgpack pack loop, so getting better
|
getting closer.......was dumbly serializing indicies.... so now float64 and int64 ndarrays are serialized pretty fast...and that's the basis for most data reading is still somewhat slow....next step is to see if we can create a custom type
|
I think this needs to wait til 0.12 and see some more work (esp on performance) |
that's fine next step is prob a custom type otherwise reading is pretty slow |
@wesm |
Do we know what's contributing to the slow read performance on msgpack'd Series? |
its processing it one structure at a time, e.g. read next structure, then decode the
|
@wesm have you had a chance to look at this? any real perf gain will have to come from defining a numpy type in msgpack I think....add to 0.13 to get tires kicked? (coulld mark as experimental) |
well, the concern would be stability of the format, do you have a sense of that? I haven't had a chance to look closely yet |
I believe the format itself is pretty stable. What needs to be done to get some really good performance is extend the msgpack spec via the extension types: https://github.com/msgpack/msgpack/blob/master/spec.md#types-extension-type, prob just for numpy arrays. So that these types become more efficient to read. Not sure for time scale for 0.13. But even if the format is subsequently changed (e.g .more/better extension types are added). I don't think back compat is that hard to maintain. |
see msgpack_numpy https://github.com/lebedov/msgpack_numpy |
@drasch already incorporated that in here. What I am talking about is cythonizing the numpy array itself. |
I'd like to dig through the serialization performance issue. Can we merge/ship this and simply note in the docstrings / docs that the binary format should not be expected to be stable for a while? |
Not sure when I'll get to the digging though, but I want to. |
sure....mark as experimental (not that stops anyone from complaining when it breaks/changes......) |
I'll take a look soon. Can you run something like flake8 over the Python |
Is the kwarg really supposed to be |
that looks like a typo from the original msgpack code (@wesm incorporated it here), but mostly just copy/paste |
pep8d the python files... |
Pep8 is good, but pyflakes or pylint check for undefined variables, |
Would it make sense to make the baseclass for UnpackException IOError or something like that? These are definitely trivial notes. Still looking at it for more substantive things. |
yep....IOError sounds good |
.. ipython:: python | ||
.. warning:: | ||
|
||
Since this is EXPERIMENTAL LIBRARY, the storage format may not be stable until a future release. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
need an "an" after "this is"
|
||
cdef inline pack_pair(self, object k, object v, int nest_limit): | ||
ret = self._pack(k, nest_limit-1) | ||
if ret != 0: raise Exception("cannot pack : %s" % k) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we change this to UnpackException or something? I know I'm very anti-Exception.
…das use cases, per pandas-dev#3814 and others
DOC: install.rst mention DOC: added license from msgpack_numpy PERF: changed Timestamp and DatetimeIndex serialization for speedups add vb_suite benchmarks ENH: added to_msgpack method in generic.py, and default import into pandas TST: all packers to always be imported, fail on usage with no msgpack installed DOC: added mentions in release notes, v0.11.1, basics ENH: provide automatic list if multiple args passed to to_msgpack DOC: changed docs to 0.12 ENH: iterator support for stream unpacking Conflicts: RELEASE.rst ENH: added support for Panel,SparseSeries,SparseDataFrame,SparsePanel,IntIndex,BlockIndex ENH: handle np.datetime64,np.timedelta64,date,timedelta types TST: added compression (zlib/blosc) via big hack DOC: moved back to 0.11.1 docs BLD: integrated with built-in msgpack DOC: io.rst fixes PERF: update vb_suite for packers TST: fix for test_list_float_complex test? PERF: prototype for packing faster PERF: was still using tolist on indicies DOC: v0.13.0.txt and release notes DOC: release notes PERF: revamples packers vbench to use packers,csv,pickle,hdf_store,hdf_table TST: better test comparison s for numpy types BLD: py3k compat
TST: removed pytest in favor of nosetest for tests/test_msgpack
bombs away |
ENH: support for msgpack serialization/deserialization
extension of #3828
ToDo
pytest
in test_msgpack