Skip to content

Commit bac7817

Browse files
committed
BLD: py3 compat
TST: removed pytest in favor of nosetest for tests/test_msgpack
1 parent d9225fb commit bac7817

14 files changed

+387
-387
lines changed

doc/source/io.rst

+9-9
Original file line numberDiff line numberDiff line change
@@ -36,7 +36,7 @@ object.
3636
* ``read_hdf``
3737
* ``read_sql``
3838
* ``read_json``
39-
* ``read_msgpack``
39+
* ``read_msgpack`` (experimental)
4040
* ``read_html``
4141
* ``read_stata``
4242
* ``read_clipboard``
@@ -49,7 +49,7 @@ The corresponding ``writer`` functions are object methods that are accessed like
4949
* ``to_hdf``
5050
* ``to_sql``
5151
* ``to_json``
52-
* ``to_msgpack``
52+
* ``to_msgpack`` (experimental)
5353
* ``to_html``
5454
* ``to_stata``
5555
* ``to_clipboard``
@@ -1737,23 +1737,23 @@ module is installed you can use it as a xlsx writer engine as follows:
17371737
Serialization
17381738
-------------
17391739

1740-
msgpack
1741-
~~~~~~~
1740+
msgpack (experimental)
1741+
~~~~~~~~~~~~~~~~~~~~~~
17421742

17431743
.. _io.msgpack:
17441744

1745-
.. versionadded:: 0.11.1
1745+
.. versionadded:: 0.13.0
17461746

1747-
Starting in 0.11.1, pandas is supporting the ``msgpack`` format for
1747+
Starting in 0.13.0, pandas is supporting the ``msgpack`` format for
17481748
object serialization. This is a lightweight portable binary format, similar
1749-
to binary JSON, that is highly space efficient, and provides good performance
1749+
to binary JSON, that is highly space efficient, and provides good performance
17501750
both on the writing (serialization), and reading (deserialization).
17511751

17521752
.. warning::
17531753

1754-
This is a very new feature of pandas. We intend to provide certain
1754+
This is a very new feature of pandas. We intend to provide certain
17551755
optimizations in the io of the ``msgpack`` data. We do not intend this
1756-
format to change (and will be backward compatible if we do).
1756+
format to change (however it is experimental)
17571757

17581758
.. ipython:: python
17591759

doc/source/v0.13.0.txt

+32-38
Original file line numberDiff line numberDiff line change
@@ -464,6 +464,15 @@ Enhancements
464464
t = Timestamp('20130101 09:01:02')
465465
t + pd.datetools.Nano(123)
466466

467+
- The ``isin`` method plays nicely with boolean indexing. To get the rows where each condition is met:
468+
469+
.. ipython:: python
470+
471+
mask = df.isin({'A': [1, 2], 'B': ['e', 'f']})
472+
df[mask.all(1)]
473+
474+
See the :ref:`documentation<indexing.basics.indexing_isin>` for more.
475+
467476
.. _whatsnew_0130.experimental:
468477

469478
Experimental
@@ -553,21 +562,35 @@ Experimental
553562
For more details see the :ref:`indexing documentation on query
554563
<indexing.query>`.
555564

556-
- DataFrame now has an ``isin`` method that can be used to easily check whether the DataFrame's values are contained in an iterable. Use a dictionary if you'd like to check specific iterables for specific columns or rows.
565+
- ``pd.read_msgpack()`` and ``pd.to_msgpack()`` are now a supported method of serialization
566+
of arbitrary pandas (and python objects) in a lightweight portable binary format. :ref:`See the docs<io.msgpack>`
557567

558-
.. ipython:: python
568+
.. warning::
559569

560-
df = pd.DataFrame({'A': [1, 2, 3], 'B': ['d', 'e', 'f']})
561-
df.isin({'A': [1, 2], 'B': ['e', 'f']})
570+
Since this is EXPERIMENTAL LIBRARY, the storage format may not be stable until a future release.
562571

563-
The ``isin`` method plays nicely with boolean indexing. To get the rows where each condition is met:
572+
.. ipython:: python
564573

565-
.. ipython:: python
574+
df = DataFrame(np.random.rand(5,2),columns=list('AB'))
575+
df.to_msgpack('foo.msg')
576+
pd.read_msgpack('foo.msg')
577+
578+
s = Series(np.random.rand(5),index=date_range('20130101',periods=5))
579+
pd.to_msgpack('foo.msg', df, s)
580+
pd.read_msgpack('foo.msg')
581+
582+
You can pass ``iterator=True`` to iterator over the unpacked results
566583

567-
mask = df.isin({'A': [1, 2], 'B': ['e', 'f']})
568-
df[mask.all(1)]
584+
.. ipython:: python
585+
586+
for o in pd.read_msgpack('foo.msg',iterator=True):
587+
print o
588+
589+
.. ipython:: python
590+
:suppress:
591+
:okexcept:
569592

570-
See the :ref:`documentation<indexing.basics.indexing_isin>` for more.
593+
os.remove('foo.msg')
571594

572595
.. _whatsnew_0130.refactoring:
573596

@@ -686,35 +709,6 @@ to unify methods and behaviors. Series formerly subclassed directly from
686709
s.a = 5
687710
s
688711

689-
IO Enhancements
690-
~~~~~~~~~~~~~~~
691-
692-
- ``pd.read_msgpack()`` and ``pd.to_msgpack()`` are now a supported method of serialization
693-
of arbitrary pandas (and python objects) in a lightweight portable binary format. :ref:`See the docs<io.msgpack>`
694-
695-
.. ipython:: python
696-
697-
df = DataFrame(np.random.rand(5,2),columns=list('AB'))
698-
df.to_msgpack('foo.msg')
699-
pd.read_msgpack('foo.msg')
700-
701-
s = Series(np.random.rand(5),index=date_range('20130101',periods=5))
702-
pd.to_msgpack('foo.msg', df, s)
703-
pd.read_msgpack('foo.msg')
704-
705-
You can pass ``iterator=True`` to iterator over the unpacked results
706-
707-
.. ipython:: python
708-
709-
for o in pd.read_msgpack('foo.msg',iterator=True):
710-
print o
711-
712-
.. ipython:: python
713-
:suppress:
714-
:okexcept:
715-
716-
os.remove('foo.msg')
717-
718712
Bug Fixes
719713
~~~~~~~~~
720714

pandas/core/generic.py

+15
Original file line numberDiff line numberDiff line change
@@ -806,6 +806,21 @@ def to_hdf(self, path_or_buf, key, **kwargs):
806806
return pytables.to_hdf(path_or_buf, key, self, **kwargs)
807807

808808
def to_msgpack(self, path_or_buf, **kwargs):
809+
"""
810+
msgpack (serialize) object to input file path
811+
812+
THIS IS AN EXPERIMENTAL LIBRARY and the storage format
813+
may not be stable until a future release.
814+
815+
Parameters
816+
----------
817+
path : string File path
818+
args : an object or objects to serialize
819+
append : boolean whether to append to an existing msgpack
820+
(default is False)
821+
compress : type of compressor (zlib or blosc), default to None (no compression)
822+
"""
823+
809824
from pandas.io import packers
810825
return packers.to_msgpack(path_or_buf, self, **kwargs)
811826

pandas/io/packers.py

+36-30
Original file line numberDiff line numberDiff line change
@@ -49,9 +49,11 @@
4949
from dateutil.parser import parse
5050

5151
import numpy as np
52+
from pandas import compat
53+
from pandas.compat import u
5254
from pandas import (
5355
Timestamp, Period, Series, TimeSeries, DataFrame, Panel, Panel4D,
54-
Index, MultiIndex, Int64Index, PeriodIndex, DatetimeIndex, NaT
56+
Index, MultiIndex, Int64Index, PeriodIndex, DatetimeIndex, Float64Index, NaT
5557
)
5658
from pandas.sparse.api import SparseSeries, SparseDataFrame, SparsePanel
5759
from pandas.sparse.array import BlockIndex, IntIndex
@@ -80,12 +82,13 @@ def to_msgpack(path, *args, **kwargs):
8082
"""
8183
msgpack (serialize) object to input file path
8284
85+
THIS IS AN EXPERIMENTAL LIBRARY and the storage format
86+
may not be stable until a future release.
87+
8388
Parameters
8489
----------
85-
path : string
86-
File path
90+
path : string File path
8791
args : an object or objects to serialize
88-
8992
append : boolean whether to append to an existing msgpack
9093
(default is False)
9194
compress : type of compressor (zlib or blosc), default to None (no compression)
@@ -112,6 +115,9 @@ def read_msgpack(path, iterator=False, **kwargs):
112115
Load msgpack pandas object from the specified
113116
file path
114117
118+
THIS IS AN EXPERIMENTAL LIBRARY and the storage format
119+
may not be stable until a future release.
120+
115121
Parameters
116122
----------
117123
path : string
@@ -134,11 +140,11 @@ def read_msgpack(path, iterator=False, **kwargs):
134140
return l
135141

136142
dtype_dict = { 21 : np.dtype('M8[ns]'),
137-
u'datetime64[ns]' : np.dtype('M8[ns]'),
138-
u'datetime64[us]' : np.dtype('M8[us]'),
143+
u('datetime64[ns]') : np.dtype('M8[ns]'),
144+
u('datetime64[us]') : np.dtype('M8[us]'),
139145
22 : np.dtype('m8[ns]'),
140-
u'timedelta64[ns]' : np.dtype('m8[ns]'),
141-
u'timedelta64[us]' : np.dtype('m8[us]') }
146+
u('timedelta64[ns]') : np.dtype('m8[ns]'),
147+
u('timedelta64[us]') : np.dtype('m8[us]') }
142148

143149
def dtype_for(t):
144150
if t in dtype_dict:
@@ -157,7 +163,7 @@ def c2f(r, i, ctype_name):
157163
"""
158164
Convert strings to complex number instance with specified numpy type.
159165
"""
160-
166+
161167
ftype = c2f_dict[ctype_name]
162168
return np.typeDict[ctype_name](ftype(r)+1j*ftype(i))
163169

@@ -224,7 +230,7 @@ def encode(obj):
224230
"""
225231
Data encoder
226232
"""
227-
233+
228234
tobj = type(obj)
229235
if isinstance(obj, Index):
230236
if isinstance(obj, PeriodIndex):
@@ -281,15 +287,15 @@ def encode(obj):
281287
'columns' : obj.columns }
282288
for f in ['default_fill_value','default_kind']:
283289
d[f] = getattr(obj,f,None)
284-
d['data'] = dict([ (name,ss) for name,ss in obj.iterkv() ])
290+
d['data'] = dict([ (name,ss) for name,ss in compat.iteritems(obj) ])
285291
return d
286292
elif isinstance(obj, SparsePanel):
287293
d = {'typ' : 'sparse_panel',
288294
'klass' : obj.__class__.__name__,
289295
'items' : obj.items }
290296
for f in ['default_fill_value','default_kind']:
291297
d[f] = getattr(obj,f,None)
292-
d['data'] = dict([ (name,df) for name,df in obj.iterkv() ])
298+
d['data'] = dict([ (name,df) for name,df in compat.iteritems(obj) ])
293299
return d
294300
else:
295301

@@ -301,8 +307,8 @@ def encode(obj):
301307
return {'typ' : 'block_manager',
302308
'klass' : obj.__class__.__name__,
303309
'axes' : data.axes,
304-
'blocks' : [ { 'items' : b.items,
305-
'values' : convert(b.values),
310+
'blocks' : [ { 'items' : b.items,
311+
'values' : convert(b.values),
306312
'shape' : b.values.shape,
307313
'dtype' : b.dtype.num,
308314
'klass' : b.__class__.__name__,
@@ -381,7 +387,7 @@ def decode(obj):
381387
"""
382388
Decoder for deserializing numpy data types.
383389
"""
384-
390+
385391
typ = obj.get('typ')
386392
if typ is None:
387393
return obj
@@ -408,7 +414,7 @@ def decode(obj):
408414

409415
def create_block(b):
410416
dtype = dtype_for(b['dtype'])
411-
return make_block(unconvert(b['values'],dtype,b['compress']).reshape(b['shape']),b['items'],axes[0],klass=getattr(internals,b['klass']))
417+
return make_block(unconvert(b['values'],dtype,b['compress']).reshape(b['shape']),b['items'],axes[0],klass=getattr(internals,b['klass']))
412418

413419
blocks = [ create_block(b) for b in obj['blocks'] ]
414420
return globals()[obj['klass']](BlockManager(blocks, axes))
@@ -454,17 +460,17 @@ def create_block(b):
454460
else:
455461
return obj
456462

457-
def pack(o, default=encode,
463+
def pack(o, default=encode,
458464
encoding='utf-8', unicode_errors='strict', use_single_float=False):
459465
"""
460466
Pack an object and return the packed bytes.
461467
"""
462468

463469
return Packer(default=default, encoding=encoding,
464-
unicode_errors=unicode_errors,
470+
unicode_errors=unicode_errors,
465471
use_single_float=use_single_float).pack(o)
466472

467-
def unpack(packed, object_hook=decode,
473+
def unpack(packed, object_hook=decode,
468474
list_hook=None, use_list=False, encoding='utf-8',
469475
unicode_errors='strict', object_pairs_hook=None):
470476
"""
@@ -473,17 +479,17 @@ def unpack(packed, object_hook=decode,
473479
"""
474480

475481
return Unpacker(packed, object_hook=object_hook,
476-
list_hook=list_hook,
482+
list_hook=list_hook,
477483
use_list=use_list, encoding=encoding,
478-
unicode_errors=unicode_errors,
484+
unicode_errors=unicode_errors,
479485
object_pairs_hook=object_pairs_hook)
480486

481487
class Packer(_Packer):
482-
def __init__(self, default=encode,
488+
def __init__(self, default=encode,
483489
encoding='utf-8',
484490
unicode_errors='strict',
485491
use_single_float=False):
486-
super(Packer, self).__init__(default=default,
492+
super(Packer, self).__init__(default=default,
487493
encoding=encoding,
488494
unicode_errors=unicode_errors,
489495
use_single_float=use_single_float)
@@ -493,14 +499,14 @@ def __init__(self, file_like=None, read_size=0, use_list=False,
493499
object_hook=decode,
494500
object_pairs_hook=None, list_hook=None, encoding='utf-8',
495501
unicode_errors='strict', max_buffer_size=0):
496-
super(Unpacker, self).__init__(file_like=file_like,
497-
read_size=read_size,
498-
use_list=use_list,
499-
object_hook=object_hook,
500-
object_pairs_hook=object_pairs_hook,
502+
super(Unpacker, self).__init__(file_like=file_like,
503+
read_size=read_size,
504+
use_list=use_list,
505+
object_hook=object_hook,
506+
object_pairs_hook=object_pairs_hook,
501507
list_hook=list_hook,
502-
encoding=encoding,
503-
unicode_errors=unicode_errors,
508+
encoding=encoding,
509+
unicode_errors=unicode_errors,
504510
max_buffer_size=max_buffer_size)
505511

506512
class Iterator(object):

pandas/io/tests/test_packers.py

+3-1
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,8 @@
77
import datetime
88
import numpy as np
99

10+
from pandas import compat
11+
from pandas.compat import u
1012
from pandas import (Series, DataFrame, Panel, MultiIndex, bdate_range,
1113
date_range, period_range, Index, SparseSeries, SparseDataFrame,
1214
SparsePanel)
@@ -146,7 +148,7 @@ def test_numpy_array_complex(self):
146148
x.dtype == x_rec.dtype)
147149

148150
def test_list_mixed(self):
149-
x = [1.0, np.float32(3.5), np.complex128(4.25), u'foo']
151+
x = [1.0, np.float32(3.5), np.complex128(4.25), u('foo')]
150152
x_rec = self.encode_decode(x)
151153
self.assert_(all(map(lambda x,y: x == y, x, x_rec)) and \
152154
all(map(lambda x,y: type(x) == type(y), x, x_rec)))

pandas/msgpack.pyx

+3-3
Original file line numberDiff line numberDiff line change
@@ -110,7 +110,7 @@ cdef class Packer(object):
110110
* *defaut* - Convert user type to builtin type that Packer supports.
111111
See also simplejson's document.
112112
* *encoding* - Convert unicode to bytes with this encoding. (default: 'utf-8')
113-
* *unicode_erros* - Error handler for encoding unicode. (default: 'strict')
113+
* *unicode_errors* - Error handler for encoding unicode. (default: 'strict')
114114
* *use_single_float* - Use single precision float type for float. (default: False)
115115
* *autoreset* - Reset buffer after each pack and return it's content as `bytes`. (default: True).
116116
If set this to false, use `bytes()` to get content and `.reset()` to clear buffer.
@@ -242,7 +242,7 @@ cdef class Packer(object):
242242
if ret != 0: break
243243

244244
# ndarray support ONLY (and float64/int64) for now
245-
elif isinstance(o, np.ndarray) and not hasattr(o,'values') and (o.dtype == 'float64' or o.dtype == 'int64'):
245+
elif isinstance(o, np.ndarray) and not hasattr(o,'values') and (o.dtype == 'float64' or o.dtype == 'int64'):
246246

247247
ret = msgpack_pack_map(&self.pk, 5)
248248
if ret != 0: return -1
@@ -276,7 +276,7 @@ cdef class Packer(object):
276276
for i in range(n):
277277

278278
i8val = array_int[i]
279-
ret = msgpack_pack_long(&self.pk, i8val)
279+
ret = msgpack_pack_long_long(&self.pk, i8val)
280280
if ret != 0: break
281281

282282
elif self._default:

pandas/tests/test_msgpack/__init__.py

Whitespace-only changes.

0 commit comments

Comments
 (0)