Skip to content

Commit 4870ad9

Browse files
committed
DOC: added mentions in release notes, v0.11.1, basics
ENH: provide automatic list if multiple args passed to to_msgpack DOC: changed docs to 0.12 ENH: iterator support for stream unpacking
1 parent 98671ee commit 4870ad9

File tree

8 files changed

+207
-103
lines changed

8 files changed

+207
-103
lines changed

RELEASE.rst

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -32,6 +32,8 @@ pandas 0.11.1
3232

3333
- pd.read_html() can now parse HTML string, files or urls and return dataframes
3434
courtesy of @cpcloud. (GH3477_)
35+
- ``pd.read_msgpack()`` and ``pd.to_msgpack()`` are now a supported method of serialization
36+
of arbitrary pandas (and python objects) in a lightweight portable binary format (GH686_)
3537

3638
**Improvements to existing features**
3739

@@ -75,6 +77,7 @@ pandas 0.11.1
7577

7678
.. _GH3164: https://github.com/pydata/pandas/issues/3164
7779
.. _GH2786: https://github.com/pydata/pandas/issues/2786
80+
.. _GH686: https://github.com/pydata/pandas/issues/686
7881
.. _GH2194: https://github.com/pydata/pandas/issues/2194
7982
.. _GH3230: https://github.com/pydata/pandas/issues/3230
8083
.. _GH3251: https://github.com/pydata/pandas/issues/3251

doc/source/basics.rst

Lines changed: 0 additions & 41 deletions
Original file line numberDiff line numberDiff line change
@@ -1192,47 +1192,6 @@ While float dtypes are unchanged.
11921192
casted
11931193
casted.dtypes
11941194
1195-
.. _basics.serialize:
1196-
1197-
Pickling and serialization
1198-
--------------------------
1199-
1200-
All pandas objects are equipped with ``save`` methods which use Python's
1201-
``cPickle`` module to save data structures to disk using the pickle format.
1202-
1203-
.. ipython:: python
1204-
1205-
df
1206-
df.save('foo.pickle')
1207-
1208-
The ``load`` function in the ``pandas`` namespace can be used to load any
1209-
pickled pandas object (or any other pickled object) from file:
1210-
1211-
1212-
.. ipython:: python
1213-
1214-
load('foo.pickle')
1215-
1216-
There is also a ``save`` function which takes any object as its first argument:
1217-
1218-
.. ipython:: python
1219-
1220-
save(df, 'foo.pickle')
1221-
load('foo.pickle')
1222-
1223-
.. ipython:: python
1224-
:suppress:
1225-
1226-
import os
1227-
os.remove('foo.pickle')
1228-
1229-
.. warning::
1230-
1231-
Loading pickled data received from untrusted sources can be unsafe.
1232-
1233-
See: http://docs.python.org/2.7/library/pickle.html
1234-
1235-
12361195
Working with package options
12371196
----------------------------
12381197

doc/source/io.rst

Lines changed: 73 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -981,6 +981,79 @@ one can use the ExcelWriter class, as in the following example:
981981
982982
.. _io.hdf5:
983983

984+
.. _basics.serialize:
985+
986+
Serialization
987+
-------------
988+
989+
msgpack
990+
~~~~~~~
991+
992+
Starting in 0.12.0, pandas is supporting the ``msgpack`` format for
993+
object serialization. This is a lightweight portable binary format, similar
994+
to binary JSON, that is highly space efficient, and provides good performance
995+
both on the writing (serialization), and reading (deserialization).
996+
997+
.. ipython:: python
998+
999+
df = DataFrame(np.random.rand(5,2),columns=list('AB'))
1000+
df.to_msgpack('foo.msg')
1001+
pd.read_msgpack('foo.msg')
1002+
s = Series(np.random.rand(5),index=date_range('20130101',periods=5))
1003+
1004+
You can pass a list of objects and you will receive them back on deserialization.
1005+
1006+
.. ipython:: python
1007+
1008+
pd.to_msgpack('foo.msg', df, 'foo', np.array([1,2,3]), s)
1009+
pd.read_msgpack('foo.msg')
1010+
1011+
.. ipython:: python
1012+
:suppress:
1013+
:okexcept:
1014+
1015+
os.remove('foo.msg')
1016+
1017+
1018+
pickling
1019+
~~~~~~~~
1020+
1021+
All pandas objects are equipped with ``save`` methods which use Python's
1022+
``cPickle`` module to save data structures to disk using the pickle format.
1023+
1024+
.. ipython:: python
1025+
1026+
df
1027+
df.save('foo.pickle')
1028+
1029+
The ``load`` function in the ``pandas`` namespace can be used to load any
1030+
pickled pandas object (or any other pickled object) from file:
1031+
1032+
1033+
.. ipython:: python
1034+
1035+
load('foo.pickle')
1036+
1037+
There is also a ``save`` function which takes any object as its first argument:
1038+
1039+
.. ipython:: python
1040+
1041+
save(df, 'foo.pickle')
1042+
load('foo.pickle')
1043+
1044+
.. ipython:: python
1045+
:suppress:
1046+
1047+
import os
1048+
os.remove('foo.pickle')
1049+
1050+
.. warning::
1051+
1052+
Loading pickled data received from untrusted sources can be unsafe.
1053+
1054+
See: http://docs.python.org/2.7/library/pickle.html
1055+
1056+
9841057
HDF5 (PyTables)
9851058
---------------
9861059

doc/source/v0.11.1.txt

Lines changed: 4 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,18 +1,17 @@
1-
.. _whatsnew_0120:
1+
.. _whatsnew_0111:
22

3-
v0.12.0 (??)
3+
v0.11.1 (??)
44
------------------------
55

6-
This is a major release from 0.11.0 and includes many new features and
7-
enhancements along with a large number of bug fixes.
6+
This is a minor release from 0.11.0 and include a small number of enhances and bug fixes.
87

98
API changes
109
~~~~~~~~~~~
1110

1211

1312
Enhancements
1413
~~~~~~~~~~~~
15-
- pd.read_html() can now parse HTML string, files or urls and return dataframes
14+
- ``pd.read_html()`` can now parse HTML string, files or urls and return dataframes
1615
courtesy of @cpcloud. (GH3477_)
1716

1817
See the `full release notes

doc/source/v0.12.0.txt

Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,35 @@
1+
.. _whatsnew_0120:
2+
3+
v0.12.0 (??????)
4+
----------------
5+
6+
This is a major release from 0.11.1 and includes many new features and
7+
enhancements along with a large number of bug fixes. There are also a
8+
number of important API changes that long-time pandas users should
9+
pay close attention to.
10+
11+
Enhancements
12+
~~~~~~~~~~~~
13+
14+
- ``pd.read_msgpack()`` and ``pd.to_msgpack()`` are now a supported method of serialization
15+
of arbitrary pandas (and python objects) in a lightweight portable binary format
16+
17+
.. ipython:: python
18+
19+
df = DataFrame(np.random.rand(5,2),columns=list('AB'))
20+
df.to_msgpack('foo.msg')
21+
pd.read_msgpack('foo.msg')
22+
23+
s = Series(np.random.rand(5),index=date_range('20130101',periods=5))
24+
pd.to_msgpack('foo.msg', df, s)
25+
pd.read_msgpack('foo.msg')
26+
27+
.. ipython:: python
28+
:suppress:
29+
:okexcept:
30+
31+
os.remove('foo.msg')
32+
33+
See the `full release notes
34+
<https://github.com/pydata/pandas/blob/master/RELEASE.rst>`__ or issue tracker
35+
on GitHub for a complete list.

doc/source/whatsnew.rst

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,8 @@ These are new features and improvements of note in each release.
1818

1919
.. include:: v0.12.0.txt
2020

21+
.. include:: v0.11.1.txt
22+
2123
.. include:: v0.11.0.txt
2224

2325
.. include:: v0.10.1.txt

pandas/io/packers.py

Lines changed: 61 additions & 52 deletions
Original file line numberDiff line numberDiff line change
@@ -68,26 +68,38 @@
6868
except:
6969
_USE_MSGPACK = False
7070

71-
def to_msgpack(path, obj, **kwargs):
71+
def to_msgpack(path, *args, **kwargs):
7272
"""
7373
msgpack (serialize) object to input file path
7474
7575
Parameters
7676
----------
7777
path : string
7878
File path
79-
obj : any object
79+
args : an object or objects to serialize
80+
81+
append : boolean whether to append to an existing msgpack
82+
(default is False)
8083
"""
8184
if not _USE_MSGPACK:
8285
raise Exception("please install msgpack to create msgpack stores!")
83-
f = open(path, 'wb')
86+
87+
append = kwargs.get('append')
88+
if append:
89+
f = open(path, 'a+b')
90+
else:
91+
f = open(path, 'wb')
8492
try:
85-
f.write(msgpack.packb(obj))
93+
if len(args) == 1:
94+
f.write(pack(args[0]))
95+
else:
96+
for a in args:
97+
f.write(pack(a))
8698
finally:
8799
f.close()
88100

89101

90-
def read_msgpack(path):
102+
def read_msgpack(path, iterator=False, **kwargs):
91103
"""
92104
Load msgpack pandas object from the specified
93105
file path
@@ -96,15 +108,24 @@ def read_msgpack(path):
96108
----------
97109
path : string
98110
File path
111+
iterator : boolean, if True, return an iterator to the unpacker
112+
(default is False)
99113
100114
Returns
101115
-------
102116
obj : type of object stored in file
117+
103118
"""
104119
if not _USE_MSGPACK:
105120
raise Exception("please install msgpack to read msgpack stores!")
121+
if iterator:
122+
return Iterator(path)
123+
106124
with open(path,'rb') as fh:
107-
return msgpack.unpackb(fh.read())
125+
l = list(unpack(fh))
126+
if len(l) == 1:
127+
return l[0]
128+
return l
108129

109130
dtype_dict = { 'datetime64[ns]' : np.dtype('M8[ns]'),
110131
'timedelta64[ns]' : np.dtype('m8[ns]') }
@@ -296,48 +317,29 @@ def create_block(b):
296317
import pdb; pdb.set_trace()
297318
return obj
298319

299-
def pack(o, stream, default=encode,
300-
encoding='utf-8', unicode_errors='strict'):
301-
"""
302-
Pack an object and write it to a stream.
303-
"""
304-
305-
_packer.pack(o, stream, default=default,
306-
encoding=encoding,
307-
unicode_errors=unicode_errors)
308-
def packb(o, default=encode,
309-
encoding='utf-8', unicode_errors='strict', use_single_float=False):
320+
def pack(o, default=encode,
321+
encoding='utf-8', unicode_errors='strict', use_single_float=False):
310322
"""
311323
Pack an object and return the packed bytes.
312324
"""
313325

314-
return _packer.packb(o, default=default, encoding=encoding,
315-
unicode_errors=unicode_errors,
316-
use_single_float=use_single_float)
317-
318-
def unpack(stream, object_hook=decode, list_hook=None, use_list=None,
319-
encoding='utf-8', unicode_errors='strict', object_pairs_hook=None):
320-
"""
321-
Unpack a packed object from a stream.
322-
"""
326+
return Packer(default=default, encoding=encoding,
327+
unicode_errors=unicode_errors,
328+
use_single_float=use_single_float).pack(o)
323329

324-
return _unpacker.unpack(stream, object_hook=object_hook,
325-
list_hook=list_hook, use_list=use_list,
326-
encoding=encoding,
327-
unicode_errors=unicode_errors,
328-
object_pairs_hook=object_pairs_hook)
329-
def unpackb(packed, object_hook=decode,
330-
list_hook=None, use_list=None, encoding='utf-8',
331-
unicode_errors='strict', object_pairs_hook=None):
330+
def unpack(packed, object_hook=decode,
331+
list_hook=None, use_list=False, encoding='utf-8',
332+
unicode_errors='strict', object_pairs_hook=None):
332333
"""
333-
Unpack a packed object.
334+
Unpack a packed object, return an iterator
335+
Note: packed lists will be returned as tuples
334336
"""
335337

336-
return _unpacker.unpackb(packed, object_hook=object_hook,
337-
list_hook=list_hook,
338-
use_list=use_list, encoding=encoding,
339-
unicode_errors=unicode_errors,
340-
object_pairs_hook=object_pairs_hook)
338+
return Unpacker(packed, object_hook=object_hook,
339+
list_hook=list_hook,
340+
use_list=use_list, encoding=encoding,
341+
unicode_errors=unicode_errors,
342+
object_pairs_hook=object_pairs_hook)
341343

342344
if _USE_MSGPACK:
343345

@@ -352,7 +354,7 @@ def __init__(self, default=encode,
352354
use_single_float=use_single_float)
353355

354356
class Unpacker(_unpacker.Unpacker):
355-
def __init__(self, file_like=None, read_size=0, use_list=None,
357+
def __init__(self, file_like=None, read_size=0, use_list=False,
356358
object_hook=decode,
357359
object_pairs_hook=None, list_hook=None, encoding='utf-8',
358360
unicode_errors='strict', max_buffer_size=0):
@@ -365,14 +367,21 @@ def __init__(self, file_like=None, read_size=0, use_list=None,
365367
encoding=encoding,
366368
unicode_errors=unicode_errors,
367369
max_buffer_size=max_buffer_size)
368-
369-
setattr(msgpack, 'Packer', Packer)
370-
setattr(msgpack, 'Unpacker', Unpacker)
371-
setattr(msgpack, 'load', unpack)
372-
setattr(msgpack, 'loads', unpackb)
373-
setattr(msgpack, 'dump', pack)
374-
setattr(msgpack, 'dumps', packb)
375-
setattr(msgpack, 'pack', pack)
376-
setattr(msgpack, 'packb', packb)
377-
setattr(msgpack, 'unpack', unpack)
378-
setattr(msgpack, 'unpackb', unpackb)
370+
371+
class Iterator(object):
372+
""" manage the unpacking iteration,
373+
close the file on completion """
374+
375+
def __init__(self, path, **kwargs):
376+
self.path = path
377+
self.kwargs = kwargs
378+
379+
def __iter__(self):
380+
381+
try:
382+
fh = open(self.path,'rb')
383+
unpacker = unpack(fh)
384+
for o in unpacker:
385+
yield o
386+
finally:
387+
fh.close()

0 commit comments

Comments
 (0)