Skip to content

Commit 16d03b7

Browse files
committed
Merge pull request #3831 from jreback/msgpack3
ENH: support for msgpack serialization/deserialization
2 parents a653af4 + 80651ca commit 16d03b7

31 files changed

+4487
-20
lines changed

LICENSES/MSGPACK_LICENSE

+13
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
Copyright (C) 2008-2011 INADA Naoki <[email protected]>
2+
3+
Licensed under the Apache License, Version 2.0 (the "License");
4+
you may not use this file except in compliance with the License.
5+
You may obtain a copy of the License at
6+
7+
http://www.apache.org/licenses/LICENSE-2.0
8+
9+
Unless required by applicable law or agreed to in writing, software
10+
distributed under the License is distributed on an "AS IS" BASIS,
11+
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12+
See the License for the specific language governing permissions and
13+
limitations under the License.

LICENSES/MSGPACK_NUMPY_LICENSE

+33
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,33 @@
1+
.. -*- rst -*-
2+
3+
License
4+
=======
5+
6+
Copyright (c) 2013, Lev Givon.
7+
All rights reserved.
8+
9+
Redistribution and use in source and binary forms, with or without
10+
modification, are permitted provided that the following conditions are
11+
met:
12+
13+
* Redistributions of source code must retain the above copyright
14+
notice, this list of conditions and the following disclaimer.
15+
* Redistributions in binary form must reproduce the above
16+
copyright notice, this list of conditions and the following
17+
disclaimer in the documentation and/or other materials provided
18+
with the distribution.
19+
* Neither the name of Lev Givon nor the names of any
20+
contributors may be used to endorse or promote products derived
21+
from this software without specific prior written permission.
22+
23+
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
24+
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
25+
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
26+
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
27+
OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
28+
SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
29+
LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
30+
DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
31+
THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
32+
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
33+
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

doc/source/io.rst

+68
Original file line numberDiff line numberDiff line change
@@ -36,6 +36,7 @@ object.
3636
* ``read_hdf``
3737
* ``read_sql``
3838
* ``read_json``
39+
* ``read_msgpack`` (experimental)
3940
* ``read_html``
4041
* ``read_stata``
4142
* ``read_clipboard``
@@ -48,6 +49,7 @@ The corresponding ``writer`` functions are object methods that are accessed like
4849
* ``to_hdf``
4950
* ``to_sql``
5051
* ``to_json``
52+
* ``to_msgpack`` (experimental)
5153
* ``to_html``
5254
* ``to_stata``
5355
* ``to_clipboard``
@@ -1732,6 +1734,72 @@ module is installed you can use it as a xlsx writer engine as follows:
17321734
17331735
.. _io.hdf5:
17341736

1737+
Serialization
1738+
-------------
1739+
1740+
msgpack (experimental)
1741+
~~~~~~~~~~~~~~~~~~~~~~
1742+
1743+
.. _io.msgpack:
1744+
1745+
.. versionadded:: 0.13.0
1746+
1747+
Starting in 0.13.0, pandas is supporting the ``msgpack`` format for
1748+
object serialization. This is a lightweight portable binary format, similar
1749+
to binary JSON, that is highly space efficient, and provides good performance
1750+
both on the writing (serialization), and reading (deserialization).
1751+
1752+
.. warning::
1753+
1754+
This is a very new feature of pandas. We intend to provide certain
1755+
optimizations in the io of the ``msgpack`` data. Since this is marked
1756+
as an EXPERIMENTAL LIBRARY, the storage format may not be stable until a future release.
1757+
1758+
.. ipython:: python
1759+
1760+
df = DataFrame(np.random.rand(5,2),columns=list('AB'))
1761+
df.to_msgpack('foo.msg')
1762+
pd.read_msgpack('foo.msg')
1763+
s = Series(np.random.rand(5),index=date_range('20130101',periods=5))
1764+
1765+
You can pass a list of objects and you will receive them back on deserialization.
1766+
1767+
.. ipython:: python
1768+
1769+
pd.to_msgpack('foo.msg', df, 'foo', np.array([1,2,3]), s)
1770+
pd.read_msgpack('foo.msg')
1771+
1772+
You can pass ``iterator=True`` to iterate over the unpacked results
1773+
1774+
.. ipython:: python
1775+
1776+
for o in pd.read_msgpack('foo.msg',iterator=True):
1777+
print o
1778+
1779+
You can pass ``append=True`` to the writer to append to an existing pack
1780+
1781+
.. ipython:: python
1782+
1783+
df.to_msgpack('foo.msg',append=True)
1784+
pd.read_msgpack('foo.msg')
1785+
1786+
Unlike other io methods, ``to_msgpack`` is available on both a per-object basis,
1787+
``df.to_msgpack()`` and using the top-level ``pd.to_msgpack(...)`` where you
1788+
can pack arbitrary collections of python lists, dicts, scalars, while intermixing
1789+
pandas objects.
1790+
1791+
.. ipython:: python
1792+
1793+
pd.to_msgpack('foo2.msg', { 'dict' : [ { 'df' : df }, { 'string' : 'foo' }, { 'scalar' : 1. }, { 's' : s } ] })
1794+
pd.read_msgpack('foo2.msg')
1795+
1796+
.. ipython:: python
1797+
:suppress:
1798+
:okexcept:
1799+
1800+
os.remove('foo.msg')
1801+
os.remove('foo2.msg')
1802+
17351803
HDF5 (PyTables)
17361804
---------------
17371805

doc/source/release.rst

+13-11
Original file line numberDiff line numberDiff line change
@@ -64,17 +64,19 @@ New features
6464
Experimental Features
6565
~~~~~~~~~~~~~~~~~~~~~
6666

67-
- The new :func:`~pandas.eval` function implements expression evaluation using
68-
``numexpr`` behind the scenes. This results in large speedups for complicated
69-
expressions involving large DataFrames/Series.
70-
- :class:`~pandas.DataFrame` has a new :meth:`~pandas.DataFrame.eval` that
71-
evaluates an expression in the context of the ``DataFrame``.
72-
- A :meth:`~pandas.DataFrame.query` method has been added that allows
73-
you to select elements of a ``DataFrame`` using a natural query syntax nearly
74-
identical to Python syntax.
75-
- ``pd.eval`` and friends now evaluate operations involving ``datetime64``
76-
objects in Python space because ``numexpr`` cannot handle ``NaT`` values
77-
(:issue:`4897`).
67+
- The new :func:`~pandas.eval` function implements expression evaluation using
68+
``numexpr`` behind the scenes. This results in large speedups for complicated
69+
expressions involving large DataFrames/Series.
70+
- :class:`~pandas.DataFrame` has a new :meth:`~pandas.DataFrame.eval` that
71+
evaluates an expression in the context of the ``DataFrame``.
72+
- A :meth:`~pandas.DataFrame.query` method has been added that allows
73+
you to select elements of a ``DataFrame`` using a natural query syntax nearly
74+
identical to Python syntax.
75+
- ``pd.eval`` and friends now evaluate operations involving ``datetime64``
76+
objects in Python space because ``numexpr`` cannot handle ``NaT`` values
77+
(:issue:`4897`).
78+
- Add msgpack support via ``pd.read_msgpack()`` and ``pd.to_msgpack()/df.to_msgpack()`` for serialization
79+
of arbitrary pandas (and python objects) in a lightweight portable binary format (:issue:`686`)
7880

7981
Improvements to existing features
8082
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

doc/source/v0.13.0.txt

+32-9
Original file line numberDiff line numberDiff line change
@@ -464,6 +464,15 @@ Enhancements
464464
t = Timestamp('20130101 09:01:02')
465465
t + pd.datetools.Nano(123)
466466

467+
- The ``isin`` method plays nicely with boolean indexing. To get the rows where each condition is met:
468+
469+
.. ipython:: python
470+
471+
mask = df.isin({'A': [1, 2], 'B': ['e', 'f']})
472+
df[mask.all(1)]
473+
474+
See the :ref:`documentation<indexing.basics.indexing_isin>` for more.
475+
467476
.. _whatsnew_0130.experimental:
468477

469478
Experimental
@@ -553,21 +562,35 @@ Experimental
553562
For more details see the :ref:`indexing documentation on query
554563
<indexing.query>`.
555564

556-
- DataFrame now has an ``isin`` method that can be used to easily check whether the DataFrame's values are contained in an iterable. Use a dictionary if you'd like to check specific iterables for specific columns or rows.
565+
- ``pd.read_msgpack()`` and ``pd.to_msgpack()`` are now a supported method of serialization
566+
of arbitrary pandas (and python objects) in a lightweight portable binary format. :ref:`See the docs<io.msgpack>`
557567

558-
.. ipython:: python
568+
.. warning::
569+
570+
Since this is an EXPERIMENTAL LIBRARY, the storage format may not be stable until a future release.
559571

560-
df = pd.DataFrame({'A': [1, 2, 3], 'B': ['d', 'e', 'f']})
561-
df.isin({'A': [1, 2], 'B': ['e', 'f']})
572+
.. ipython:: python
562573

563-
The ``isin`` method plays nicely with boolean indexing. To get the rows where each condition is met:
574+
df = DataFrame(np.random.rand(5,2),columns=list('AB'))
575+
df.to_msgpack('foo.msg')
576+
pd.read_msgpack('foo.msg')
564577

565-
.. ipython:: python
578+
s = Series(np.random.rand(5),index=date_range('20130101',periods=5))
579+
pd.to_msgpack('foo.msg', df, s)
580+
pd.read_msgpack('foo.msg')
566581

567-
mask = df.isin({'A': [1, 2], 'B': ['e', 'f']})
568-
df[mask.all(1)]
582+
You can pass ``iterator=True`` to iterator over the unpacked results
583+
584+
.. ipython:: python
585+
586+
for o in pd.read_msgpack('foo.msg',iterator=True):
587+
print o
588+
589+
.. ipython:: python
590+
:suppress:
591+
:okexcept:
569592

570-
See the :ref:`documentation<indexing.basics.indexing_isin>` for more.
593+
os.remove('foo.msg')
571594

572595
.. _whatsnew_0130.refactoring:
573596

pandas/core/generic.py

+19
Original file line numberDiff line numberDiff line change
@@ -805,6 +805,25 @@ def to_hdf(self, path_or_buf, key, **kwargs):
805805
from pandas.io import pytables
806806
return pytables.to_hdf(path_or_buf, key, self, **kwargs)
807807

808+
def to_msgpack(self, path_or_buf, **kwargs):
809+
"""
810+
msgpack (serialize) object to input file path
811+
812+
THIS IS AN EXPERIMENTAL LIBRARY and the storage format
813+
may not be stable until a future release.
814+
815+
Parameters
816+
----------
817+
path : string File path
818+
args : an object or objects to serialize
819+
append : boolean whether to append to an existing msgpack
820+
(default is False)
821+
compress : type of compressor (zlib or blosc), default to None (no compression)
822+
"""
823+
824+
from pandas.io import packers
825+
return packers.to_msgpack(path_or_buf, self, **kwargs)
826+
808827
def to_pickle(self, path):
809828
"""
810829
Pickle (serialize) object to input file path

pandas/io/api.py

+1
Original file line numberDiff line numberDiff line change
@@ -11,3 +11,4 @@
1111
from pandas.io.sql import read_sql
1212
from pandas.io.stata import read_stata
1313
from pandas.io.pickle import read_pickle, to_pickle
14+
from pandas.io.packers import read_msgpack, to_msgpack

0 commit comments

Comments
 (0)