You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
import pandas as pd
pdf = pd.DataFrame(dict(A=[1,1,1,2,2,2], B = [1,2,3,4,5,6]))
pdf['A'] = pdf['A'].astype('category')
pdf.set_index('A', inplace = True)
pdf.to_msgpack('/some/path')
pdf2 = pd.read_msgpack('/some/path')
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
<ipython-input-6-cab186b6bdcd> in <module>()
4 pdf.set_index('A', inplace = True)
5 pdf.to_msgpack(/some/path')
----> 6 pdf2 = pd.read_msgpack('/some/path')
/.../Anaconda2/lib/python2.7/site-packages/pandas/io/packers.pyc in read_msgpack(path_or_buf, encoding, iterator, **kwargs)
200 if exists:
201 with open(path_or_buf, 'rb') as fh:
--> 202 return read(fh)
203
204 # treat as a binary-like
/.../Anaconda2/lib/python2.7/site-packages/pandas/io/packers.pyc in read(fh)
185
186 def read(fh):
--> 187 l = list(unpack(fh, encoding=encoding, **kwargs))
188 if len(l) == 1:
189 return l[0]
pandas/msgpack/_unpacker.pyx in pandas.msgpack._unpacker.Unpacker.__next__ (pandas/msgpack/_unpacker.cpp:5618)()
pandas/msgpack/_unpacker.pyx in pandas.msgpack._unpacker.Unpacker._unpack (pandas/msgpack/_unpacker.cpp:4602)()
/.../Anaconda2/lib/python2.7/site-packages/pandas/io/packers.pyc in decode(obj)
557 data = unconvert(obj[u'data'], dtype,
558 obj.get(u'compress'))
--> 559 return globals()[obj[u'klass']](data, dtype=dtype, name=obj[u'name'])
560 elif typ == u'range_index':
561 return globals()[obj[u'klass']](obj[u'start'],
KeyError: u'CategoricalIndex'
Problem description
read_msgpack apparently does not seem to support a CategoricalIndex, however, it is possible to save a dataframe with a CategoricalIndex using to_msgpack.
Background: I am currently using the to_msgpack method to save a dask dataframe, where the index is (something like) a time stamp. It is not unique. I am overall very satisfied with the performance of to_msgpack, however when it comes to space efficency, having a categorical index would probably provide a significant improvement.
The following code fails:
Problem description
read_msgpack
apparently does not seem to support a CategoricalIndex, however, it is possible to save a dataframe with a CategoricalIndex usingto_msgpack
.Background: I am currently using the to_msgpack method to save a dask dataframe, where the index is (something like) a time stamp. It is not unique. I am overall very satisfied with the performance of
to_msgpack
, however when it comes to space efficency, having a categorical index would probably provide a significant improvement.Or maybe it works, but I am using it wrong?
Output of
pd.show_versions()
INSTALLED VERSIONS
commit: None
python: 2.7.12.final.0
python-bits: 64
OS: Linux
OS-release: 2.6.16.60-0.42.5-smp
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: None.None
pandas: 0.19.2
nose: 1.3.7
pip: 8.1.2
setuptools: 27.2.0
Cython: 0.24.1
numpy: 1.11.1
scipy: 0.18.1
statsmodels: 0.6.1
xarray: None
IPython: 5.2.2
sphinx: 1.4.6
patsy: 0.4.1
dateutil: 2.5.3
pytz: 2016.6.1
blosc: 1.4.4
bottleneck: 1.1.0
tables: 3.2.3.1
numexpr: 2.6.1
matplotlib: 1.5.3
openpyxl: 2.3.2
xlrd: 1.0.0
xlwt: 1.1.2
xlsxwriter: 0.9.3
lxml: 3.6.4
bs4: 4.5.1
html5lib: None
httplib2: 0.9.2
apiclient: None
sqlalchemy: 1.0.13
pymysql: None
psycopg2: None
jinja2: 2.8
boto: 2.42.0
pandas_datareader: None
The text was updated successfully, but these errors were encountered: