Skip to content

to_msgpack fails for Dataframes with numpy bools #18390

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
PhyNerd opened this issue Nov 20, 2017 · 3 comments · Fixed by #18395
Closed

to_msgpack fails for Dataframes with numpy bools #18390

PhyNerd opened this issue Nov 20, 2017 · 3 comments · Fixed by #18395
Labels
Bug Dtype Conversions Unexpected or buggy dtype conversions
Milestone

Comments

@PhyNerd
Copy link
Contributor

PhyNerd commented Nov 20, 2017

Code Sample, a copy-pastable example if possible

import pandas as pd
import numpy as np
pd.DataFrame(pd.Series((np.bool_(1), 2, 3, 4, 5))).to_msgpack('test.msg')

Problem description

When trying to export Dataframes containing numpy bools with the DataFrame.to_msgpack() function one gets the exception:

  File "pandas/io/msgpack/_packer.pyx", line 230, in pandas.io.msgpack._packer.Packer.pack
  File "pandas/io/msgpack/_packer.pyx", line 232, in pandas.io.msgpack._packer.Packer.pack
  File "pandas/io/msgpack/_packer.pyx", line 191, in pandas.io.msgpack._packer.Packer._pack
  File "pandas/io/msgpack/_packer.pyx", line 220, in pandas.io.msgpack._packer.Packer._pack
  File "pandas/io/msgpack/_packer.pyx", line 191, in pandas.io.msgpack._packer.Packer._pack
  File "pandas/io/msgpack/_packer.pyx", line 220, in pandas.io.msgpack._packer.Packer._pack
  File "pandas/io/msgpack/_packer.pyx", line 227, in pandas.io.msgpack._packer.Packer._pack
TypeError: can't serialize True

I'm guessing this is because the isinstance check on line 136 of pandas.io.msgpack._packer.pyx returns False for numpy.bool_ .

Expected Output

I'd expect DataFrames with numpy bools in them to serialize just like ones with Python bools do.

Output of pd.show_versions()

[paste the output of pd.show_versions() here below this line]
pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 2.7.14.final.0
python-bits: 64
OS: Darwin
OS-release: 17.3.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: de_DE.UTF-8
LOCALE: None.None

pandas: 0.21.0
pytest: 3.1.2
pip: 9.0.1
setuptools: 27.2.0
Cython: 0.25.2
numpy: 1.13.3
scipy: 0.19.1
pyarrow: 0.7.1
xarray: None
IPython: 5.3.0
sphinx: 1.6.2
patsy: 0.4.1
dateutil: 2.6.1
pytz: 2017.2
blosc: None
bottleneck: 1.2.1
tables: 3.4.2
numexpr: 2.6.2
feather: 0.4.0
matplotlib: 2.0.2
openpyxl: 2.4.7
xlrd: 1.0.0
xlwt: 1.2.0
xlsxwriter: 0.9.6
lxml: 3.8.0
bs4: 4.6.0
html5lib: 0.999
sqlalchemy: 1.1.11
pymysql: None
psycopg2: None
jinja2: 2.9.6
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

@gfyoung gfyoung added IO Msgpack Dtype Conversions Unexpected or buggy dtype conversions Bug labels Nov 20, 2017
@gfyoung
Copy link
Member

gfyoung commented Nov 20, 2017

@PhyNerd : Your diagnosis looks correct to me. That should be pretty easy to patch in light of that, so a PR would be most welcome!

@jreback jreback added this to the 0.21.1 milestone Nov 21, 2017
@jreback
Copy link
Contributor

jreback commented Nov 21, 2017

you realize that storing non-integers co-mingled is completely inefficient as this is object dtype.

@PhyNerd
Copy link
Contributor Author

PhyNerd commented Nov 21, 2017

Yes I do.
The Dataframes I want to export often have missing data, where fields are None in columns that are otherwise all np.bool_. In other cases users add mixed datatypes. There isn't much I can do about that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Dtype Conversions Unexpected or buggy dtype conversions
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants