Skip to content

Serialize numpy scalars in pd.msgpack #12500

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
mrocklin opened this issue Mar 1, 2016 · 8 comments
Closed

Serialize numpy scalars in pd.msgpack #12500

mrocklin opened this issue Mar 1, 2016 · 8 comments
Labels

Comments

@mrocklin
Copy link
Contributor

mrocklin commented Mar 1, 2016

Code Sample, a copy-pastable example if possible

import numpy as np
import pandas as pd
pd.msgpack.dumps(np.int32(0))

Currently procduces:

TypeError: can't serialize 0

Expected Output

b'\x00'

output of pd.show_versions()

In [5]: pd.show_versions()

INSTALLED VERSIONS
------------------
commit: None
python: 3.4.4.final.0
python-bits: 64
OS: Linux
OS-release: 3.13.0-59-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8

pandas: 0.17.0rc1+752.gfe201a2
nose: 1.3.7
pip: 8.0.3
setuptools: 20.1.1
Cython: 0.22.1
numpy: 1.10.4
scipy: 0.15.1
statsmodels: 0.6.1
IPython: 4.0.1
sphinx: 1.3.1
patsy: 0.3.0
dateutil: 2.4.2
pytz: 2015.4
blosc: 1.2.8
bottleneck: 1.0.0
tables: 3.2.0
numexpr: 2.4.3
matplotlib: 1.4.3
openpyxl: 1.8.5
xlrd: 0.9.3
xlwt: 1.0.0
xlsxwriter: 0.7.3
lxml: 3.4.4
bs4: 4.3.2
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: 1.0.5
pymysql: None
psycopg2: None
jinja2: 2.8

Related to msgpack/msgpack-python#61

@jreback
Copy link
Contributor

jreback commented Mar 1, 2016

hmm, I thought this should work and was tested. guess not. Should be a simple fix. Where did you encounter in the wild? I don't think you can actually do this with a pandas object (at least it won't generate a numpy scalar directly).

@jreback jreback added this to the 0.18.1 milestone Mar 1, 2016
@mrocklin
Copy link
Contributor Author

mrocklin commented Mar 1, 2016

I ran into it originally with df.memory_usage().sum() in an older version of Pandas. This doesn't recur in master. Still, I would like to be able to depend upon the msgpack implementation in Pandas for general computations, where this might arise.

@jreback jreback modified the milestones: 0.18.1, 0.18.2 Apr 25, 2016
@jorisvandenbossche
Copy link
Member

@jreback @mrocklin status of this? If it's an easy fix, maybe we can still include it in 0.19.0?

@jorisvandenbossche
Copy link
Member

@jreback Seems that adding a

elif isinstance(o, np.object):
    o = o.item()
    continue

in _packer.Packers_pack() fixes the issue. But is that the level where you would want to fix this? (alternatively, the Packer itself could still be unaware of numpy scalars, and could convert them beforehand in the msgpack.pack method before passing the value to the Packer).

@jreback
Copy link
Contributor

jreback commented Aug 29, 2016

we have a routine to detect
zero_dim_ndarray (and extract) in pandas.lib

@jreback jreback modified the milestones: 0.19.0, Next Major Release Sep 28, 2016
@samuelsinayoko
Copy link
Contributor

Happy to help out with this one. Are we meant to dig into pandas/io/msgpack/_packer.pyx to get this fixed?

@jorisvandenbossche
Copy link
Member

@samuelsinayoko the msgpack support is deprecated in pandas, so I don't think it is worth improving it.

@simonjayhawkins
Copy link
Member

msgpack is deprecated #30112

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants