Skip to content

Commit d9225fb

Browse files
committed
ENH: support for msgpack serialization/deserialization
DOC: install.rst mention DOC: added license from msgpack_numpy PERF: changed Timestamp and DatetimeIndex serialization for speedups add vb_suite benchmarks ENH: added to_msgpack method in generic.py, and default import into pandas TST: all packers to always be imported, fail on usage with no msgpack installed DOC: added mentions in release notes, v0.11.1, basics ENH: provide automatic list if multiple args passed to to_msgpack DOC: changed docs to 0.12 ENH: iterator support for stream unpacking Conflicts: RELEASE.rst ENH: added support for Panel,SparseSeries,SparseDataFrame,SparsePanel,IntIndex,BlockIndex ENH: handle np.datetime64,np.timedelta64,date,timedelta types TST: added compression (zlib/blosc) via big hack DOC: moved back to 0.11.1 docs BLD: integrated with built-in msgpack DOC: io.rst fixes PERF: update vb_suite for packers TST: fix for test_list_float_complex test? PERF: prototype for packing faster PERF: was still using tolist on indicies DOC: v0.13.0.txt and release notes DOC: release notes PERF: revamples packers vbench to use packers,csv,pickle,hdf_store,hdf_table TST: better test comparison s for numpy types BLD: py3k compat
1 parent 1501356 commit d9225fb

File tree

11 files changed

+1196
-11
lines changed

11 files changed

+1196
-11
lines changed

LICENSES/MSGPACK_NUMPY_LICENSE

+33
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,33 @@
1+
.. -*- rst -*-
2+
3+
License
4+
=======
5+
6+
Copyright (c) 2013, Lev Givon.
7+
All rights reserved.
8+
9+
Redistribution and use in source and binary forms, with or without
10+
modification, are permitted provided that the following conditions are
11+
met:
12+
13+
* Redistributions of source code must retain the above copyright
14+
notice, this list of conditions and the following disclaimer.
15+
* Redistributions in binary form must reproduce the above
16+
copyright notice, this list of conditions and the following
17+
disclaimer in the documentation and/or other materials provided
18+
with the distribution.
19+
* Neither the name of Lev Givon nor the names of any
20+
contributors may be used to endorse or promote products derived
21+
from this software without specific prior written permission.
22+
23+
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
24+
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
25+
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
26+
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
27+
OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
28+
SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
29+
LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
30+
DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
31+
THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
32+
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
33+
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

doc/source/io.rst

+68
Original file line numberDiff line numberDiff line change
@@ -36,6 +36,7 @@ object.
3636
* ``read_hdf``
3737
* ``read_sql``
3838
* ``read_json``
39+
* ``read_msgpack``
3940
* ``read_html``
4041
* ``read_stata``
4142
* ``read_clipboard``
@@ -48,6 +49,7 @@ The corresponding ``writer`` functions are object methods that are accessed like
4849
* ``to_hdf``
4950
* ``to_sql``
5051
* ``to_json``
52+
* ``to_msgpack``
5153
* ``to_html``
5254
* ``to_stata``
5355
* ``to_clipboard``
@@ -1732,6 +1734,72 @@ module is installed you can use it as a xlsx writer engine as follows:
17321734
17331735
.. _io.hdf5:
17341736

1737+
Serialization
1738+
-------------
1739+
1740+
msgpack
1741+
~~~~~~~
1742+
1743+
.. _io.msgpack:
1744+
1745+
.. versionadded:: 0.11.1
1746+
1747+
Starting in 0.11.1, pandas is supporting the ``msgpack`` format for
1748+
object serialization. This is a lightweight portable binary format, similar
1749+
to binary JSON, that is highly space efficient, and provides good performance
1750+
both on the writing (serialization), and reading (deserialization).
1751+
1752+
.. warning::
1753+
1754+
This is a very new feature of pandas. We intend to provide certain
1755+
optimizations in the io of the ``msgpack`` data. We do not intend this
1756+
format to change (and will be backward compatible if we do).
1757+
1758+
.. ipython:: python
1759+
1760+
df = DataFrame(np.random.rand(5,2),columns=list('AB'))
1761+
df.to_msgpack('foo.msg')
1762+
pd.read_msgpack('foo.msg')
1763+
s = Series(np.random.rand(5),index=date_range('20130101',periods=5))
1764+
1765+
You can pass a list of objects and you will receive them back on deserialization.
1766+
1767+
.. ipython:: python
1768+
1769+
pd.to_msgpack('foo.msg', df, 'foo', np.array([1,2,3]), s)
1770+
pd.read_msgpack('foo.msg')
1771+
1772+
You can pass ``iterator=True`` to iterate over the unpacked results
1773+
1774+
.. ipython:: python
1775+
1776+
for o in pd.read_msgpack('foo.msg',iterator=True):
1777+
print o
1778+
1779+
You can pass ``append=True`` to the writer to append to an existing pack
1780+
1781+
.. ipython:: python
1782+
1783+
df.to_msgpack('foo.msg',append=True)
1784+
pd.read_msgpack('foo.msg')
1785+
1786+
Unlike other io methods, ``to_msgpack`` is available on both a per-object basis,
1787+
``df.to_msgpack()`` and using the top-level ``pd.to_msgpack(...)`` where you
1788+
can pack arbitrary collections of python lists, dicts, scalars, while intermixing
1789+
pandas objects.
1790+
1791+
.. ipython:: python
1792+
1793+
pd.to_msgpack('foo2.msg', { 'dict' : [ { 'df' : df }, { 'string' : 'foo' }, { 'scalar' : 1. }, { 's' : s } ] })
1794+
pd.read_msgpack('foo2.msg')
1795+
1796+
.. ipython:: python
1797+
:suppress:
1798+
:okexcept:
1799+
1800+
os.remove('foo.msg')
1801+
os.remove('foo2.msg')
1802+
17351803
HDF5 (PyTables)
17361804
---------------
17371805

doc/source/release.rst

+13-11
Original file line numberDiff line numberDiff line change
@@ -64,17 +64,19 @@ New features
6464
Experimental Features
6565
~~~~~~~~~~~~~~~~~~~~~
6666

67-
- The new :func:`~pandas.eval` function implements expression evaluation using
68-
``numexpr`` behind the scenes. This results in large speedups for complicated
69-
expressions involving large DataFrames/Series.
70-
- :class:`~pandas.DataFrame` has a new :meth:`~pandas.DataFrame.eval` that
71-
evaluates an expression in the context of the ``DataFrame``.
72-
- A :meth:`~pandas.DataFrame.query` method has been added that allows
73-
you to select elements of a ``DataFrame`` using a natural query syntax nearly
74-
identical to Python syntax.
75-
- ``pd.eval`` and friends now evaluate operations involving ``datetime64``
76-
objects in Python space because ``numexpr`` cannot handle ``NaT`` values
77-
(:issue:`4897`).
67+
- The new :func:`~pandas.eval` function implements expression evaluation using
68+
``numexpr`` behind the scenes. This results in large speedups for complicated
69+
expressions involving large DataFrames/Series.
70+
- :class:`~pandas.DataFrame` has a new :meth:`~pandas.DataFrame.eval` that
71+
evaluates an expression in the context of the ``DataFrame``.
72+
- A :meth:`~pandas.DataFrame.query` method has been added that allows
73+
you to select elements of a ``DataFrame`` using a natural query syntax nearly
74+
identical to Python syntax.
75+
- ``pd.eval`` and friends now evaluate operations involving ``datetime64``
76+
objects in Python space because ``numexpr`` cannot handle ``NaT`` values
77+
(:issue:`4897`).
78+
- Add msgpack support via ``pd.read_msgpack()`` and ``pd.to_msgpack()/df.to_msgpack()`` for serialization
79+
of arbitrary pandas (and python objects) in a lightweight portable binary format (:issue:`686`)
7880

7981
Improvements to existing features
8082
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

doc/source/v0.13.0.txt

+29
Original file line numberDiff line numberDiff line change
@@ -686,6 +686,35 @@ to unify methods and behaviors. Series formerly subclassed directly from
686686
s.a = 5
687687
s
688688

689+
IO Enhancements
690+
~~~~~~~~~~~~~~~
691+
692+
- ``pd.read_msgpack()`` and ``pd.to_msgpack()`` are now a supported method of serialization
693+
of arbitrary pandas (and python objects) in a lightweight portable binary format. :ref:`See the docs<io.msgpack>`
694+
695+
.. ipython:: python
696+
697+
df = DataFrame(np.random.rand(5,2),columns=list('AB'))
698+
df.to_msgpack('foo.msg')
699+
pd.read_msgpack('foo.msg')
700+
701+
s = Series(np.random.rand(5),index=date_range('20130101',periods=5))
702+
pd.to_msgpack('foo.msg', df, s)
703+
pd.read_msgpack('foo.msg')
704+
705+
You can pass ``iterator=True`` to iterator over the unpacked results
706+
707+
.. ipython:: python
708+
709+
for o in pd.read_msgpack('foo.msg',iterator=True):
710+
print o
711+
712+
.. ipython:: python
713+
:suppress:
714+
:okexcept:
715+
716+
os.remove('foo.msg')
717+
689718
Bug Fixes
690719
~~~~~~~~~
691720

pandas/core/generic.py

+4
Original file line numberDiff line numberDiff line change
@@ -805,6 +805,10 @@ def to_hdf(self, path_or_buf, key, **kwargs):
805805
from pandas.io import pytables
806806
return pytables.to_hdf(path_or_buf, key, self, **kwargs)
807807

808+
def to_msgpack(self, path_or_buf, **kwargs):
809+
from pandas.io import packers
810+
return packers.to_msgpack(path_or_buf, self, **kwargs)
811+
808812
def to_pickle(self, path):
809813
"""
810814
Pickle (serialize) object to input file path

pandas/io/api.py

+1
Original file line numberDiff line numberDiff line change
@@ -11,3 +11,4 @@
1111
from pandas.io.sql import read_sql
1212
from pandas.io.stata import read_stata
1313
from pandas.io.pickle import read_pickle, to_pickle
14+
from pandas.io.packers import read_msgpack, to_msgpack

0 commit comments

Comments
 (0)