You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I've been having a very frustrating problem where I could save a DataFrame with unique labels to HDF5, and load it again, and find that the loaded version had non-unique labels. It turns out this is because pd.read_hdf is replacing all labels containing non-ASCII characters with the empty string. For example, a row with the label 'café' will have the label '' when it is loaded again.
I looked at the bytes that were written in the HDF5 file, and confirmed that the proper UTF-8 text was in there, so the bug is in read_hdf.
jreback
changed the title
DataFrame.to_hdf will write Unicode labels, but pd.read_hdf won't read them
DataFrame.to_hdf will write Unicode labels, but pd.read_hdf won't read them
Feb 12, 2016
My apologies -- this was just because I was reading data saved from 0.17 on a machine that was running 0.16. It seems that 0.17 even does the right thing when I don't specify an encoding at all.
I've been having a very frustrating problem where I could save a DataFrame with unique labels to HDF5, and load it again, and find that the loaded version had non-unique labels. It turns out this is because
pd.read_hdf
is replacing all labels containing non-ASCII characters with the empty string. For example, a row with the label'café'
will have the label''
when it is loaded again.I looked at the bytes that were written in the HDF5 file, and confirmed that the proper UTF-8 text was in there, so the bug is in
read_hdf
.Version information:
The text was updated successfully, but these errors were encountered: