Skip to content

read_hdf throws UnicodeDecodeError with Python 3.5 and 3.6 but not with Python 2.7 #17540

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
zoof opened this issue Sep 15, 2017 · 17 comments
Closed
Labels
IO HDF5 read_hdf, HDFStore

Comments

@zoof
Copy link

zoof commented Sep 15, 2017

Code Sample, a copy-pastable example if possible

import pandas as pd
df = pd.read_hdf('data.h5')

Problem description

The HDF5 dataset was created with pandas, to_hdf in Python 2.7 and can be read in by Python 2.7. When I try to read it in with Python 3.5 or Python 3.6, I get the following:

---------------------------------------------------------------------------
UnicodeDecodeError                        Traceback (most recent call last)
<ipython-input-2-53006689fd2c> in <module>()
----> 1 df = pd.read_hdf(data.h5')

/home/tct/anaconda2/envs/py36/lib/python3.6/site-packages/pandas/io/pytables.py in read_hdf(path_or_buf, key, **kwargs)
    356                                      'contains multiple datasets.')
    357             key = candidate_only_group._v_pathname
--> 358         return store.select(key, auto_close=auto_close, **kwargs)
    359     except:
    360         # if there is an error, close the store

/home/tct/anaconda2/envs/py36/lib/python3.6/site-packages/pandas/io/pytables.py in select(self, key, where, start, stop, columns, iterator, chunksize, auto_close, **kwargs)
    720                            chunksize=chunksize, auto_close=auto_close)
    721 
--> 722         return it.get_result()
    723 
    724     def select_as_coordinates(

/home/tct/anaconda2/envs/py36/lib/python3.6/site-packages/pandas/io/pytables.py in get_result(self, coordinates)
   1426 
   1427         # directly return the result
-> 1428         results = self.func(self.start, self.stop, where)
   1429         self.close()
   1430         return results

/home/tct/anaconda2/envs/py36/lib/python3.6/site-packages/pandas/io/pytables.py in func(_start, _stop, _where)
    713             return s.read(start=_start, stop=_stop,
    714                           where=_where,
--> 715                           columns=columns, **kwargs)
    716 
    717         # create the iterator

/home/tct/anaconda2/envs/py36/lib/python3.6/site-packages/pandas/io/pytables.py in read(self, start, stop, **kwargs)
   2864             blk_items = self.read_index('block%d_items' % i)
   2865             values = self.read_array('block%d_values' % i,
-> 2866                                      start=_start, stop=_stop)
   2867             blk = make_block(values,
   2868                              placement=items.get_indexer(blk_items))

/home/tct/anaconda2/envs/py36/lib/python3.6/site-packages/pandas/io/pytables.py in read_array(self, key, start, stop)
   2413         import tables
   2414         node = getattr(self.group, key)
-> 2415         data = node[start:stop]
   2416         attrs = node._v_attrs
   2417 

/home/tct/anaconda2/envs/py36/lib/python3.6/site-packages/tables/vlarray.py in __getitem__(self, key)
    673             start, stop, step = self._process_range(
    674                 key.start, key.stop, key.step)
--> 675             return self.read(start, stop, step)
    676         # Try with a boolean or point selection
    677         elif type(key) in (list, tuple) or isinstance(key, numpy.ndarray):

/home/tct/anaconda2/envs/py36/lib/python3.6/site-packages/tables/vlarray.py in read(self, start, stop, step)
    813         atom = self.atom
    814         if not hasattr(atom, 'size'):  # it is a pseudo-atom
--> 815             outlistarr = [atom.fromarray(arr) for arr in listarr]
    816         else:
    817             # Convert the list to the right flavor

/home/tct/anaconda2/envs/py36/lib/python3.6/site-packages/tables/vlarray.py in <listcomp>(.0)
    813         atom = self.atom
    814         if not hasattr(atom, 'size'):  # it is a pseudo-atom
--> 815             outlistarr = [atom.fromarray(arr) for arr in listarr]
    816         else:
    817             # Convert the list to the right flavor

/home/tct/anaconda2/envs/py36/lib/python3.6/site-packages/tables/atom.py in fromarray(self, array)
   1226         if array.size == 0:
   1227             return None
-> 1228         return six.moves.cPickle.loads(array.tostring())

UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 23: ordinal not in range(128)

Note: We receive a lot of issues on our GitHub tracker, so it is very possible that your issue has been posted before. Please check first before submitting so that we do not have to handle and close duplicates!

Note: Many problems can be resolved by simply upgrading pandas to the latest version. Before submitting, please check if that solution works for you. If possible, you may want to check if master addresses this issue, but that is not necessary.

For documentation-related issues, you can check the latest versions of the docs on master here:

https://pandas-docs.github.io/pandas-docs-travis/

If the issue has not been resolved there, go ahead and file it in the issue tracker.

Expected Output

In [1]: import pandas as pd
In [2]: df = pd.read_hdf('data.h5')

Output of pd.show_versions()

[paste the output of pd.show_versions() here below this line]

INSTALLED VERSIONS
------------------
commit: None
python: 3.6.1.final.0
python-bits: 64
OS: Linux
OS-release: 4.9.0-3-amd64
machine: x86_64
processor: 
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.20.1
pytest: 3.0.7
pip: 9.0.1
setuptools: 27.2.0
Cython: 0.25.2
numpy: 1.12.1
scipy: 0.19.0
xarray: None
IPython: 5.3.0
sphinx: 1.5.6
patsy: 0.4.1
dateutil: 2.6.0
pytz: 2017.2
blosc: None
bottleneck: 1.2.1
tables: 3.3.0
numexpr: 2.6.2
feather: None
matplotlib: 2.0.2
openpyxl: 2.4.7
xlrd: 1.0.0
xlwt: 1.2.0
xlsxwriter: 0.9.6
lxml: 3.7.3
bs4: 4.6.0
html5lib: 0.999
sqlalchemy: 1.1.9
pymysql: None
psycopg2: None
jinja2: 2.9.6
s3fs: None
pandas_gbq: None
pandas_datareader: None
INSTALLED VERSIONS
------------------
commit: None
python: 2.7.13.final.0
python-bits: 64
OS: Linux
OS-release: 4.9.0-3-amd64
machine: x86_64
processor: 
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: None.None

pandas: 0.20.1
pytest: 3.0.7
pip: 9.0.1
setuptools: 27.2.0
Cython: 0.25.2
numpy: 1.11.2
scipy: 0.18.1
xarray: None
IPython: 5.3.0
sphinx: 1.5.6
patsy: 0.4.1
dateutil: 2.6.0
pytz: 2017.2
blosc: None
bottleneck: 1.2.1
tables: 3.3.0
numexpr: 2.6.1
feather: None
matplotlib: 2.0.2
openpyxl: 2.4.7
xlrd: 1.0.0
xlwt: 1.2.0
xlsxwriter: 0.9.6
lxml: 3.7.3
bs4: 4.6.0
html5lib: 0.999
sqlalchemy: 1.1.9
pymysql: None
psycopg2: None
jinja2: 2.9.6
s3fs: None
pandas_gbq: None
pandas_datareader: None
@gfyoung gfyoung added the IO HDF5 read_hdf, HDFStore label Sep 15, 2017
@gfyoung
Copy link
Member

gfyoung commented Sep 15, 2017

@zoof : Thanks for reporting this. Strange that it's in this order and not vice-versa (support for unicode is much better in Python 3.x than in Python 2.x).

I see that you are using 0.20.1. Just for reference, can you try upgrading and see if that changes anything?

@jreback : I seem to be recalling a previous issue similar to this. Am I right about that or not?

@zoof
Copy link
Author

zoof commented Sep 15, 2017

Updated to 0.20.3:

INSTALLED VERSIONS
------------------
commit: None
python: 3.6.1.final.0
python-bits: 64
OS: Linux
OS-release: 4.9.0-3-amd64
machine: x86_64
processor: 
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.20.3
pytest: 3.0.7
pip: 9.0.1
setuptools: 27.2.0
Cython: 0.25.2
numpy: 1.12.1
scipy: 0.19.0
xarray: None
IPython: 5.3.0
sphinx: 1.5.6
patsy: 0.4.1
dateutil: 2.6.0
pytz: 2017.2
blosc: None
bottleneck: 1.2.1
tables: 3.3.0
numexpr: 2.6.2
feather: None
matplotlib: 2.0.2
openpyxl: 2.4.7
xlrd: 1.0.0
xlwt: 1.2.0
xlsxwriter: 0.9.6
lxml: 3.7.3
bs4: 4.6.0
html5lib: 0.999
sqlalchemy: 1.1.9
pymysql: None
psycopg2: None
jinja2: 2.9.6
s3fs: None
pandas_gbq: None
pandas_datareader: None

@jreback
Copy link
Contributor

jreback commented Sep 15, 2017

show what u wrote and how

@zoof
Copy link
Author

zoof commented Sep 16, 2017

Sorry, basically the same as before:

In [1]: import pandas as pd

In [2]: pd.read_hdf('data.h5')
---------------------------------------------------------------------------
UnicodeDecodeError                        Traceback (most recent call last)
<ipython-input-2-21d8820a6af9> in <module>()
----> 1 pd.read_hdf('data.h5')

/home/tct/anaconda2/envs/py36/lib/python3.6/site-packages/pandas/io/pytables.py in read_hdf(path_or_buf, key, mode, **kwargs)
    370                                      'contains multiple datasets.')
    371             key = candidate_only_group._v_pathname
--> 372         return store.select(key, auto_close=auto_close, **kwargs)
    373     except:
    374         # if there is an error, close the store

/home/tct/anaconda2/envs/py36/lib/python3.6/site-packages/pandas/io/pytables.py in select(self, key, where, start, stop, columns, iterator, chunksize, auto_close, **kwargs)
    740                            chunksize=chunksize, auto_close=auto_close)
    741 
--> 742         return it.get_result()
    743 
    744     def select_as_coordinates(

/home/tct/anaconda2/envs/py36/lib/python3.6/site-packages/pandas/io/pytables.py in get_result(self, coordinates)
   1447 
   1448         # directly return the result
-> 1449         results = self.func(self.start, self.stop, where)
   1450         self.close()
   1451         return results

/home/tct/anaconda2/envs/py36/lib/python3.6/site-packages/pandas/io/pytables.py in func(_start, _stop, _where)
    733             return s.read(start=_start, stop=_stop,
    734                           where=_where,
--> 735                           columns=columns, **kwargs)
    736 
    737         # create the iterator

/home/tct/anaconda2/envs/py36/lib/python3.6/site-packages/pandas/io/pytables.py in read(self, start, stop, **kwargs)
   2885             blk_items = self.read_index('block%d_items' % i)
   2886             values = self.read_array('block%d_values' % i,
-> 2887                                      start=_start, stop=_stop)
   2888             blk = make_block(values,
   2889                              placement=items.get_indexer(blk_items))

/home/tct/anaconda2/envs/py36/lib/python3.6/site-packages/pandas/io/pytables.py in read_array(self, key, start, stop)
   2434         import tables
   2435         node = getattr(self.group, key)
-> 2436         data = node[start:stop]
   2437         attrs = node._v_attrs
   2438 

/home/tct/anaconda2/envs/py36/lib/python3.6/site-packages/tables/vlarray.py in __getitem__(self, key)
    673             start, stop, step = self._process_range(
    674                 key.start, key.stop, key.step)
--> 675             return self.read(start, stop, step)
    676         # Try with a boolean or point selection
    677         elif type(key) in (list, tuple) or isinstance(key, numpy.ndarray):

/home/tct/anaconda2/envs/py36/lib/python3.6/site-packages/tables/vlarray.py in read(self, start, stop, step)
    813         atom = self.atom
    814         if not hasattr(atom, 'size'):  # it is a pseudo-atom
--> 815             outlistarr = [atom.fromarray(arr) for arr in listarr]
    816         else:
    817             # Convert the list to the right flavor

/home/tct/anaconda2/envs/py36/lib/python3.6/site-packages/tables/vlarray.py in <listcomp>(.0)
    813         atom = self.atom
    814         if not hasattr(atom, 'size'):  # it is a pseudo-atom
--> 815             outlistarr = [atom.fromarray(arr) for arr in listarr]
    816         else:
    817             # Convert the list to the right flavor

/home/tct/anaconda2/envs/py36/lib/python3.6/site-packages/tables/atom.py in fromarray(self, array)
   1226         if array.size == 0:
   1227             return None
-> 1228         return six.moves.cPickle.loads(array.tostring())

UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 23: ordinal not in range(128)

@jreback
Copy link
Contributor

jreback commented Sep 16, 2017

you are not answering the question; show an example of WRITING

@zoof
Copy link
Author

zoof commented Sep 16, 2017

I guess you want a sample dataset? I extracted a small sample from the troublesome series in the large HDF file: https://ufile.io/l94bf. This file too works with Python 2.7 but fails with Python 3.x.

@jreback
Copy link
Contributor

jreback commented Sep 16, 2017

you need to show a complete example that includes writing and reading

@zoof
Copy link
Author

zoof commented Sep 16, 2017

Like this? The data in each instance is the same, just different sources.

In [3]: pd.DataFrame(['Executive Director of HR',
 'Assistant Director of HR',
 'Instructor Chair of Paramedics',
 'Proctor Testing Center \xe2\x80\x93 PT',
 'Instructor \xe2\x80\x93 Welding (Automotive)',
 'Lab Tech \xe2\x80\x93 Automotive \xe2\x80\x93 PT',
 'Lab Tech Technology \xe2\x80\x93 PT',
 'Maintenance Tech',
 'Business Services Coordinator',
 'Scheduler']).to_hdf('data.h5','data')

In [4]: pd.read_hdf('data.h5')
Out[4]: 
                                     0
0             Executive Director of HR
1             Assistant Director of HR
2       Instructor Chair of Paramedics
3        Proctor Testing Center � PT
4  Instructor � Welding (Automotive)
5       Lab Tech � Automotive � PT
6           Lab Tech Technology � PT
7                     Maintenance Tech
8        Business Services Coordinator
9                            Scheduler

In [5]: pd.read_hdf('data16.h5')
---------------------------------------------------------------------------
UnicodeDecodeError                        Traceback (most recent call last)
<ipython-input-5-2a895d140f15> in <module>()
----> 1 pd.read_hdf('data16.h5')

/home/tct/anaconda2/envs/py36/lib/python3.6/site-packages/pandas/io/pytables.py in read_hdf(path_or_buf, key, mode, **kwargs)
    370                                      'contains multiple datasets.')
    371             key = candidate_only_group._v_pathname
--> 372         return store.select(key, auto_close=auto_close, **kwargs)
    373     except:
    374         # if there is an error, close the store

/home/tct/anaconda2/envs/py36/lib/python3.6/site-packages/pandas/io/pytables.py in select(self, key, where, start, stop, columns, iterator, chunksize, auto_close, **kwargs)
    740                            chunksize=chunksize, auto_close=auto_close)
    741 
--> 742         return it.get_result()
    743 
    744     def select_as_coordinates(

/home/tct/anaconda2/envs/py36/lib/python3.6/site-packages/pandas/io/pytables.py in get_result(self, coordinates)
   1447 
   1448         # directly return the result
-> 1449         results = self.func(self.start, self.stop, where)
   1450         self.close()
   1451         return results

/home/tct/anaconda2/envs/py36/lib/python3.6/site-packages/pandas/io/pytables.py in func(_start, _stop, _where)
    733             return s.read(start=_start, stop=_stop,
    734                           where=_where,
--> 735                           columns=columns, **kwargs)
    736 
    737         # create the iterator

/home/tct/anaconda2/envs/py36/lib/python3.6/site-packages/pandas/io/pytables.py in read(self, **kwargs)
   2751         kwargs = self.validate_read(kwargs)
   2752         index = self.read_index('index', **kwargs)
-> 2753         values = self.read_array('values', **kwargs)
   2754         return Series(values, index=index, name=self.name)
   2755 

/home/tct/anaconda2/envs/py36/lib/python3.6/site-packages/pandas/io/pytables.py in read_array(self, key, start, stop)
   2434         import tables
   2435         node = getattr(self.group, key)
-> 2436         data = node[start:stop]
   2437         attrs = node._v_attrs
   2438 

/home/tct/anaconda2/envs/py36/lib/python3.6/site-packages/tables/vlarray.py in __getitem__(self, key)
    673             start, stop, step = self._process_range(
    674                 key.start, key.stop, key.step)
--> 675             return self.read(start, stop, step)
    676         # Try with a boolean or point selection
    677         elif type(key) in (list, tuple) or isinstance(key, numpy.ndarray):

/home/tct/anaconda2/envs/py36/lib/python3.6/site-packages/tables/vlarray.py in read(self, start, stop, step)
    813         atom = self.atom
    814         if not hasattr(atom, 'size'):  # it is a pseudo-atom
--> 815             outlistarr = [atom.fromarray(arr) for arr in listarr]
    816         else:
    817             # Convert the list to the right flavor

/home/tct/anaconda2/envs/py36/lib/python3.6/site-packages/tables/vlarray.py in <listcomp>(.0)
    813         atom = self.atom
    814         if not hasattr(atom, 'size'):  # it is a pseudo-atom
--> 815             outlistarr = [atom.fromarray(arr) for arr in listarr]
    816         else:
    817             # Convert the list to the right flavor

/home/tct/anaconda2/envs/py36/lib/python3.6/site-packages/tables/atom.py in fromarray(self, array)
   1226         if array.size == 0:
   1227             return None
-> 1228         return six.moves.cPickle.loads(array.tostring())

UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 23: ordinal not in range(128)

@zoof
Copy link
Author

zoof commented Sep 18, 2017

Use 2.7

pd.DataFrame(['Executive Director of HR',
 'Assistant Director of HR',
 'Instructor Chair of Paramedics',
 'Proctor Testing Center \xe2\x80\x93 PT',
 'Instructor \xe2\x80\x93 Welding (Automotive)',
 'Lab Tech \xe2\x80\x93 Automotive \xe2\x80\x93 PT',
 'Lab Tech Technology \xe2\x80\x93 PT',
 'Maintenance Tech',
 'Business Services Coordinator',
 'Scheduler']).to_hdf('data.h5','data')

Try to read in 3.6

pd.read_hdf('data.h5')
---------------------------------------------------------------------------
UnicodeDecodeError                        Traceback (most recent call last)
<ipython-input-2-8ce48fe594b7> in <module>()
----> 1 pd.read_hdf('data.h5')

/home/tct/anaconda2/envs/py36/lib/python3.6/site-packages/pandas/io/pytables.py in read_hdf(path_or_buf, key, mode, **kwargs)
    370                                      'contains multiple datasets.')
    371             key = candidate_only_group._v_pathname
--> 372         return store.select(key, auto_close=auto_close, **kwargs)
    373     except:
    374         # if there is an error, close the store

/home/tct/anaconda2/envs/py36/lib/python3.6/site-packages/pandas/io/pytables.py in select(self, key, where, start, stop, columns, iterator, chunksize, auto_close, **kwargs)
    740                            chunksize=chunksize, auto_close=auto_close)
    741 
--> 742         return it.get_result()
    743 
    744     def select_as_coordinates(

/home/tct/anaconda2/envs/py36/lib/python3.6/site-packages/pandas/io/pytables.py in get_result(self, coordinates)
   1447 
   1448         # directly return the result
-> 1449         results = self.func(self.start, self.stop, where)
   1450         self.close()
   1451         return results

/home/tct/anaconda2/envs/py36/lib/python3.6/site-packages/pandas/io/pytables.py in func(_start, _stop, _where)
    733             return s.read(start=_start, stop=_stop,
    734                           where=_where,
--> 735                           columns=columns, **kwargs)
    736 
    737         # create the iterator

/home/tct/anaconda2/envs/py36/lib/python3.6/site-packages/pandas/io/pytables.py in read(self, start, stop, **kwargs)
   2885             blk_items = self.read_index('block%d_items' % i)
   2886             values = self.read_array('block%d_values' % i,
-> 2887                                      start=_start, stop=_stop)
   2888             blk = make_block(values,
   2889                              placement=items.get_indexer(blk_items))

/home/tct/anaconda2/envs/py36/lib/python3.6/site-packages/pandas/io/pytables.py in read_array(self, key, start, stop)
   2434         import tables
   2435         node = getattr(self.group, key)
-> 2436         data = node[start:stop]
   2437         attrs = node._v_attrs
   2438 

/home/tct/anaconda2/envs/py36/lib/python3.6/site-packages/tables/vlarray.py in __getitem__(self, key)
    673             start, stop, step = self._process_range(
    674                 key.start, key.stop, key.step)
--> 675             return self.read(start, stop, step)
    676         # Try with a boolean or point selection
    677         elif type(key) in (list, tuple) or isinstance(key, numpy.ndarray):

/home/tct/anaconda2/envs/py36/lib/python3.6/site-packages/tables/vlarray.py in read(self, start, stop, step)
    813         atom = self.atom
    814         if not hasattr(atom, 'size'):  # it is a pseudo-atom
--> 815             outlistarr = [atom.fromarray(arr) for arr in listarr]
    816         else:
    817             # Convert the list to the right flavor

/home/tct/anaconda2/envs/py36/lib/python3.6/site-packages/tables/vlarray.py in <listcomp>(.0)
    813         atom = self.atom
    814         if not hasattr(atom, 'size'):  # it is a pseudo-atom
--> 815             outlistarr = [atom.fromarray(arr) for arr in listarr]
    816         else:
    817             # Convert the list to the right flavor

/home/tct/anaconda2/envs/py36/lib/python3.6/site-packages/tables/atom.py in fromarray(self, array)
   1226         if array.size == 0:
   1227             return None
-> 1228         return six.moves.cPickle.loads(array.tostring())

UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 23: ordinal not in range(128)

@jreback
Copy link
Contributor

jreback commented Sep 19, 2017

This is not supported for fixed stores, try using format='table' when you save in 2.7

@jreback
Copy link
Contributor

jreback commented Sep 19, 2017

you can also see #11126, and try passing encoding='utf-8' in 2.7.

@jreback jreback changed the title read_hdf throws UnicodeDecodeError with Python 3.5 and 3.6 but not with Python 2.7 read_hdf throws UnicodeDecodeError with Python 3.5 and 3.6 but not with Python 2.7 Sep 19, 2017
@zoof
Copy link
Author

zoof commented Sep 19, 2017

The former works but the latter does not. I don't see why this is not a bug though since 2.7 can read the file produced without format='table' but 3.x cannot.

@jreback
Copy link
Contributor

jreback commented Sep 19, 2017

it is simply not supported but the underllhing infrastructure (e.g. in PyTables).

@jreback jreback closed this as completed Sep 19, 2017
@jreback jreback added this to the won't fix milestone Sep 19, 2017
@zoof
Copy link
Author

zoof commented Sep 19, 2017

Just a postscript. format='table' only works for a single column of data. When trying to save the entire dataset in Python 2.7,

TypeError: Cannot serialize the column [task_list] because
its data contents are [unicode] object dtype

when saving using encoding='utf-8' the file is saved but again cannot be read in 3.x. TypeError: lookup() argument must be str, not numpy.bytes_

@asanakoy
Copy link

I have the same issue.
Why did you decide not to fix it?

@asanakoy
Copy link

asanakoy commented Jul 13, 2019

As a workaround I'm currently converting my python2.7 dataframes in JSON and then read them using python3.6.

# Run this in py2.7
#####################
import pandas as pd

# read dataframe in py2.7
path = 'df.hdf5' # path to dataframe saved in py2.7
df = pd.read_hdf(path)
json_string = pd.to_json(compression='gzip')
with open('df.json.gz', 'w') as fp:
    fp.write(json_string)

#####################
# Now run in py3.6
#####################
import pandas as pd
with open('df.json.gz', 'r') as fp:
    json_string = fp.read()
df = pd.read_json(json_string)

@envhyf
Copy link

envhyf commented Oct 15, 2019

Just a postscript. format='table' only works for a single column of data. When trying to save the entire dataset in Python 2.7,

TypeError: Cannot serialize the column [task_list] because
its data contents are [unicode] object dtype

when saving using encoding='utf-8' the file is saved but again cannot be read in 3.x. TypeError: lookup() argument must be str, not numpy.bytes_

Hi, I met a similar issue. The dataframe was saved in Python 2.7 with format ='table', encoding ='utf-8'. However, when I read it in Python 3.7 by pd.read_hdf('xxx.hdf', key='xx',encoding = 'utf-8'). The error shows like: lookup() argument must be str, not numpy.bytes_

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
IO HDF5 read_hdf, HDFStore
Projects
None yet
Development

No branches or pull requests

6 participants