Skip to content

ENH: support for msgpack serialization/deserialization #3525

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 5 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
33 changes: 33 additions & 0 deletions LICENSES/MSGPACK_NUMPY_LICENSE
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
.. -*- rst -*-

License
=======

Copyright (c) 2013, Lev Givon.
All rights reserved.

Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are
met:

* Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
* Redistributions in binary form must reproduce the above
copyright notice, this list of conditions and the following
disclaimer in the documentation and/or other materials provided
with the distribution.
* Neither the name of Lev Givon nor the names of any
contributors may be used to endorse or promote products derived
from this software without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
3 changes: 3 additions & 0 deletions RELEASE.rst
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,8 @@ pandas 0.11.1

- pd.read_html() can now parse HTML string, files or urls and return dataframes
courtesy of @cpcloud. (GH3477_)
- ``pd.read_msgpack()`` and ``pd.to_msgpack()`` are now a supported method of serialization
of arbitrary pandas (and python objects) in a lightweight portable binary format (GH686_)

**Improvements to existing features**

Expand Down Expand Up @@ -75,6 +77,7 @@ pandas 0.11.1

.. _GH3164: https://github.com/pydata/pandas/issues/3164
.. _GH2786: https://github.com/pydata/pandas/issues/2786
.. _GH686: https://github.com/pydata/pandas/issues/686
.. _GH2194: https://github.com/pydata/pandas/issues/2194
.. _GH3230: https://github.com/pydata/pandas/issues/3230
.. _GH3251: https://github.com/pydata/pandas/issues/3251
Expand Down
41 changes: 0 additions & 41 deletions doc/source/basics.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1192,47 +1192,6 @@ While float dtypes are unchanged.
casted
casted.dtypes

.. _basics.serialize:

Pickling and serialization
--------------------------

All pandas objects are equipped with ``save`` methods which use Python's
``cPickle`` module to save data structures to disk using the pickle format.

.. ipython:: python

df
df.save('foo.pickle')

The ``load`` function in the ``pandas`` namespace can be used to load any
pickled pandas object (or any other pickled object) from file:


.. ipython:: python

load('foo.pickle')

There is also a ``save`` function which takes any object as its first argument:

.. ipython:: python

save(df, 'foo.pickle')
load('foo.pickle')

.. ipython:: python
:suppress:

import os
os.remove('foo.pickle')

.. warning::

Loading pickled data received from untrusted sources can be unsafe.

See: http://docs.python.org/2.7/library/pickle.html


Working with package options
----------------------------

Expand Down
1 change: 1 addition & 0 deletions doc/source/install.rst
Original file line number Diff line number Diff line change
Expand Up @@ -93,6 +93,7 @@ Optional Dependencies
version. Version 0.17.1 or higher.
* `SciPy <http://www.scipy.org>`__: miscellaneous statistical functions
* `PyTables <http://www.pytables.org>`__: necessary for HDF5-based storage
* `msgpack <http://www.msgpack.org>`__: necessary for msgpack based serialization
* `matplotlib <http://matplotlib.sourceforge.net/>`__: for plotting
* `statsmodels <http://statsmodels.sourceforge.net/>`__
* Needed for parts of :mod:`pandas.stats`
Expand Down
88 changes: 88 additions & 0 deletions doc/source/io.rst
Original file line number Diff line number Diff line change
Expand Up @@ -981,6 +981,94 @@ one can use the ExcelWriter class, as in the following example:

.. _io.hdf5:

.. _basics.serialize:

Serialization
-------------

msgpack
~~~~~~~

Starting in 0.12.0, pandas is supporting the ``msgpack`` format for
object serialization. This is a lightweight portable binary format, similar
to binary JSON, that is highly space efficient, and provides good performance
both on the writing (serialization), and reading (deserialization).

.. ipython:: python

df = DataFrame(np.random.rand(5,2),columns=list('AB'))
df.to_msgpack('foo.msg')
pd.read_msgpack('foo.msg')
s = Series(np.random.rand(5),index=date_range('20130101',periods=5))

You can pass a list of objects and you will receive them back on deserialization.

.. ipython:: python

pd.to_msgpack('foo.msg', df, 'foo', np.array([1,2,3]), s)
pd.read_msgpack('foo.msg')

You can pass ``iterator=True`` to iterator over the unpacked results

.. ipython:: python

for o in pd.read_msgpack('foo.msg',iterator=True):
print o


You can pass ``append=True`` to the writer to append to an existing pack

.. ipython:: python

df.to_msgpack('foo.msg',append=True)
pd.read_msgpack('foo.msg')

.. ipython:: python
:suppress:
:okexcept:

os.remove('foo.msg')


pickling
~~~~~~~~

All pandas objects are equipped with ``save`` methods which use Python's
``cPickle`` module to save data structures to disk using the pickle format.

.. ipython:: python

df
df.save('foo.pickle')

The ``load`` function in the ``pandas`` namespace can be used to load any
pickled pandas object (or any other pickled object) from file:


.. ipython:: python

load('foo.pickle')

There is also a ``save`` function which takes any object as its first argument:

.. ipython:: python

save(df, 'foo.pickle')
load('foo.pickle')

.. ipython:: python
:suppress:

import os
os.remove('foo.pickle')

.. warning::

Loading pickled data received from untrusted sources can be unsafe.

See: http://docs.python.org/2.7/library/pickle.html


HDF5 (PyTables)
---------------

Expand Down
9 changes: 4 additions & 5 deletions doc/source/v0.11.1.txt
Original file line number Diff line number Diff line change
@@ -1,18 +1,17 @@
.. _whatsnew_0120:
.. _whatsnew_0111:

v0.12.0 (??)
v0.11.1 (??)
------------------------

This is a major release from 0.11.0 and includes many new features and
enhancements along with a large number of bug fixes.
This is a minor release from 0.11.0 and include a small number of enhances and bug fixes.

API changes
~~~~~~~~~~~


Enhancements
~~~~~~~~~~~~
- pd.read_html() can now parse HTML string, files or urls and return dataframes
- ``pd.read_html()`` can now parse HTML string, files or urls and return dataframes
courtesy of @cpcloud. (GH3477_)

See the `full release notes
Expand Down
42 changes: 42 additions & 0 deletions doc/source/v0.12.0.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
.. _whatsnew_0120:

v0.12.0 (??????)
----------------

This is a major release from 0.11.1 and includes many new features and
enhancements along with a large number of bug fixes. There are also a
number of important API changes that long-time pandas users should
pay close attention to.

Enhancements
~~~~~~~~~~~~

- ``pd.read_msgpack()`` and ``pd.to_msgpack()`` are now a supported method of serialization
of arbitrary pandas (and python objects) in a lightweight portable binary format

.. ipython:: python

df = DataFrame(np.random.rand(5,2),columns=list('AB'))
df.to_msgpack('foo.msg')
pd.read_msgpack('foo.msg')

s = Series(np.random.rand(5),index=date_range('20130101',periods=5))
pd.to_msgpack('foo.msg', df, s)
pd.read_msgpack('foo.msg')

You can pass ``iterator=True`` to iterator over the unpacked results

.. ipython:: python

for o in pd.read_msgpack('foo.msg',iterator=True):
print o

.. ipython:: python
:suppress:
:okexcept:

os.remove('foo.msg')

See the `full release notes
<https://github.com/pydata/pandas/blob/master/RELEASE.rst>`__ or issue tracker
on GitHub for a complete list.
2 changes: 2 additions & 0 deletions doc/source/whatsnew.rst
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,8 @@ These are new features and improvements of note in each release.

.. include:: v0.12.0.txt

.. include:: v0.11.1.txt

.. include:: v0.11.0.txt

.. include:: v0.10.1.txt
Expand Down
1 change: 1 addition & 0 deletions pandas/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,7 @@
from pandas.io.parsers import (read_csv, read_table, read_clipboard,
read_fwf, to_clipboard, ExcelFile,
ExcelWriter)
from pandas.io.packers import read_msgpack, to_msgpack
from pandas.io.pytables import HDFStore, Term, get_store, read_hdf
from pandas.io.html import read_html
from pandas.util.testing import debug
Expand Down
4 changes: 4 additions & 0 deletions pandas/core/generic.py
Original file line number Diff line number Diff line change
Expand Up @@ -487,6 +487,10 @@ def to_hdf(self, path_or_buf, key, **kwargs):
from pandas.io import pytables
return pytables.to_hdf(path_or_buf, key, self, **kwargs)

def to_msgpack(self, path_or_buf, **kwargs):
from pandas.io import packers
return packers.to_msgpack(path_or_buf, self, **kwargs)

# install the indexerse
for _name, _indexer in indexing.get_indexers_list():
PandasObject._create_indexer(_name,_indexer)
Expand Down
1 change: 0 additions & 1 deletion pandas/core/internals.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,6 @@
from pandas.tslib import Timestamp
from pandas.util import py3compat


class Block(object):
"""
Canonical n-dimensional unit of homogeneous dtype contained in a pandas
Expand Down
Loading