Skip to content

Msgpack roundtrips python 3 strings to bytes #13945

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
alanhdu opened this issue Aug 9, 2016 · 4 comments
Closed

Msgpack roundtrips python 3 strings to bytes #13945

alanhdu opened this issue Aug 9, 2016 · 4 comments
Labels
Duplicate Report Duplicate issue or pull request

Comments

@alanhdu
Copy link
Contributor

alanhdu commented Aug 9, 2016

Possibly related to #13591. Causes dask/dask#1452

Code Sample, a copy-pastable example if possible

import pandas as pd
pd.msgpack.unpackb(pd.msgpack.packb("a"))

Expected Output

"a"

instead, we get

b"a"

output of pd.show_versions()

INSTALLED VERSIONS
------------------
commit: None
python: 3.5.2.final.0
python-bits: 64
OS: Linux
OS-release: 4.4.0-31-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8

pandas: 0.18.1
nose: None
pip: 8.1.2
setuptools: 25.1.6
Cython: None
numpy: 1.11.1
scipy: None
statsmodels: None
xarray: None
IPython: 5.0.0
sphinx: None
patsy: None
dateutil: 2.5.3
pytz: 2016.6.1
blosc: None
bottleneck: None
tables: None
numexpr: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.8
boto: None
pandas_datareader: None
@jreback
Copy link
Contributor

jreback commented Aug 9, 2016

duplicate of #13591

you are welcome to add this example there.

@jreback jreback closed this as completed Aug 9, 2016
@jreback jreback added Duplicate Report Duplicate issue or pull request 2/3 Compat labels Aug 9, 2016
@jreback
Copy link
Contributor

jreback commented Aug 9, 2016

In [3]: s = Series(['a','b'])

In [5]: pd.read_msgpack(s.to_msgpack())
Out[5]: 
0    a
1    b
dtype: object

In [6]: pd.read_msgpack(s.to_msgpack()).values
Out[6]: array(['a', 'b'], dtype=object)

further msgpack.* routines are private, AND they do exactly what they say, namely translate object->bytes via roundtrip. So you are simply using them wrong.

@jreback jreback added this to the No action milestone Aug 9, 2016
@jreback
Copy link
Contributor

jreback commented Aug 9, 2016

In [8]: pd.msgpack.dumps('s')
Out[8]: b'\xa1s'

In [9]: pd.msgpack.loads(pd.msgpack.dumps('s'))
Out[9]: b's'

These are the public functions which dask uses.

In [1]: pd.msgpack.loads?
Docstring:
unpackb(packed, object_hook=None, list_hook=None, bool use_list=1, encoding=None, unicode_errors='strict', object_pairs_hook=None, ext_hook=ExtType, Py_ssize_t max_str_len=2147483647, Py_ssize_t max_bin_len=2147483647, Py_ssize_t max_array_len=2147483647, Py_ssize_t max_map_len=2147483647, Py_ssize_t max_ext_len=2147483647)

Unpack packed_bytes to object. Returns an unpacked object.

Raises `ValueError` when `packed` contains extra bytes.

See :class:`Unpacker` for options.
Type:      builtin_function_or_method

In [2]: pd.msgpack.dumps?
Signature: pd.msgpack.dumps(o, **kwargs)
Docstring:
Pack object `o` and return packed bytes

See :class:`Packer` for options.
File:      ~/miniconda/envs/py3.5/lib/python3.5/site-packages/pandas/msgpack/__init__.py
Type:      function

@mrocklin
Copy link
Contributor

mrocklin commented Aug 9, 2016

In the dask.distributed protocol we manage this by using the use_bin_type and encoding keyword arguments

In [1]: import pandas as pd

In [2]: pd.msgpack.unpackb(pd.msgpack.packb("a"))
Out[2]: b'a'

In [3]: pd.msgpack.unpackb(pd.msgpack.packb(u"a", use_bin_type=True), encoding='
   ...: utf8')
Out[3]: 'a'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Duplicate Report Duplicate issue or pull request
Projects
None yet
Development

No branches or pull requests

3 participants