Skip to content

support CategoricalIndex for read_msgpack #15487

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
abast opened this issue Feb 23, 2017 · 1 comment
Closed

support CategoricalIndex for read_msgpack #15487

abast opened this issue Feb 23, 2017 · 1 comment
Labels
Bug Categorical Categorical Data Type
Milestone

Comments

@abast
Copy link
Contributor

abast commented Feb 23, 2017

The following code fails:

import pandas as pd
pdf = pd.DataFrame(dict(A=[1,1,1,2,2,2], B = [1,2,3,4,5,6]))
pdf['A'] = pdf['A'].astype('category')
pdf.set_index('A', inplace = True)
pdf.to_msgpack('/some/path')
pdf2 = pd.read_msgpack('/some/path')
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-6-cab186b6bdcd> in <module>()
      4 pdf.set_index('A', inplace = True)
      5 pdf.to_msgpack(/some/path')
----> 6 pdf2 = pd.read_msgpack('/some/path')

/.../Anaconda2/lib/python2.7/site-packages/pandas/io/packers.pyc in read_msgpack(path_or_buf, encoding, iterator, **kwargs)
    200         if exists:
    201             with open(path_or_buf, 'rb') as fh:
--> 202                 return read(fh)
    203 
    204     # treat as a binary-like

/.../Anaconda2/lib/python2.7/site-packages/pandas/io/packers.pyc in read(fh)
    185 
    186     def read(fh):
--> 187         l = list(unpack(fh, encoding=encoding, **kwargs))
    188         if len(l) == 1:
    189             return l[0]

pandas/msgpack/_unpacker.pyx in pandas.msgpack._unpacker.Unpacker.__next__ (pandas/msgpack/_unpacker.cpp:5618)()

pandas/msgpack/_unpacker.pyx in pandas.msgpack._unpacker.Unpacker._unpack (pandas/msgpack/_unpacker.cpp:4602)()

/.../Anaconda2/lib/python2.7/site-packages/pandas/io/packers.pyc in decode(obj)
    557         data = unconvert(obj[u'data'], dtype,
    558                          obj.get(u'compress'))
--> 559         return globals()[obj[u'klass']](data, dtype=dtype, name=obj[u'name'])
    560     elif typ == u'range_index':
    561         return globals()[obj[u'klass']](obj[u'start'],

KeyError: u'CategoricalIndex'

Problem description

read_msgpack apparently does not seem to support a CategoricalIndex, however, it is possible to save a dataframe with a CategoricalIndex using to_msgpack.

Background: I am currently using the to_msgpack method to save a dask dataframe, where the index is (something like) a time stamp. It is not unique. I am overall very satisfied with the performance of to_msgpack, however when it comes to space efficency, having a categorical index would probably provide a significant improvement.

Or maybe it works, but I am using it wrong?

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 2.7.12.final.0
python-bits: 64
OS: Linux
OS-release: 2.6.16.60-0.42.5-smp
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: None.None

pandas: 0.19.2
nose: 1.3.7
pip: 8.1.2
setuptools: 27.2.0
Cython: 0.24.1
numpy: 1.11.1
scipy: 0.18.1
statsmodels: 0.6.1
xarray: None
IPython: 5.2.2
sphinx: 1.4.6
patsy: 0.4.1
dateutil: 2.5.3
pytz: 2016.6.1
blosc: 1.4.4
bottleneck: 1.1.0
tables: 3.2.3.1
numexpr: 2.6.1
matplotlib: 1.5.3
openpyxl: 2.3.2
xlrd: 1.0.0
xlwt: 1.1.2
xlsxwriter: 0.9.3
lxml: 3.6.4
bs4: 4.5.1
html5lib: None
httplib2: 0.9.2
apiclient: None
sqlalchemy: 1.0.13
pymysql: None
psycopg2: None
jinja2: 2.8
boto: 2.42.0
pandas_datareader: None

@jreback
Copy link
Contributor

jreback commented Feb 23, 2017

this is almost trivial to add, just add the import (or maybe should just look things up as getattr(pd, ....) rather than globals()[...].

of course some tests would be good!

@jreback jreback added this to the Next Major Release milestone Feb 23, 2017
@jreback jreback modified the milestones: 0.20.0, Next Major Release Feb 24, 2017
AnkurDedania pushed a commit to AnkurDedania/pandas that referenced this issue Mar 21, 2017
closes pandas-dev#15487

Author: Arco Bast <[email protected]>

Closes pandas-dev#15493 from abast/CategoricalIndex_msgpack and squashes the following commits:

c1c68e4 [Arco Bast] corrections
3c1f2e7 [Arco Bast] whatsnew
215c2aa [Arco Bast] improve tests
cd9354f [Arco Bast] improve tests
7895c16 [Arco Bast] flake8
f3f492a [Arco Bast] fix test
91d85cb [Arco Bast] msgpack supports CategoricalIndex
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Categorical Categorical Data Type
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants