Skip to content

Commit e675f82

Browse files
committed
BUG: read_pickle fallback to latin_1 upon a UnicodeDecodeError
When a reading a pickle with MultiIndex columns generated in py27 `pickle_compat.load()` with `enconding=None` would throw an UnicodeDecodeError when reading a pickle created in py27. Now, `read_pickle` catches that exception and fallback to use `latin-1` explicitly.
1 parent ac3056f commit e675f82

File tree

4 files changed

+18
-1
lines changed

4 files changed

+18
-1
lines changed

doc/source/whatsnew/v1.0.2.rst

+1
Original file line numberDiff line numberDiff line change
@@ -38,6 +38,7 @@ Bug fixes
3838

3939
- Using ``pd.NA`` with :meth:`DataFrame.to_json` now correctly outputs a null value instead of an empty object (:issue:`31615`)
4040
- Fixed bug in parquet roundtrip with nullable unsigned integer dtypes (:issue:`31896`).
41+
- Fixed bug where :meth:`pandas.io.pickleread_pickle` raised a ``UnicodeDecodeError` when reading a py27 pickle with MultiIndex column (:issue:`31988`).
4142
4243
4344

pandas/io/pickle.py

+5-1
Original file line numberDiff line numberDiff line change
@@ -183,7 +183,11 @@ def read_pickle(
183183
# e.g.
184184
# "No module named 'pandas.core.sparse.series'"
185185
# "Can't get attribute '__nat_unpickle' on <module 'pandas._libs.tslib"
186-
return pc.load(f, encoding=None)
186+
try:
187+
return pc.load(f, encoding=None)
188+
except UnicodeDecodeError:
189+
# e.g. can occur for files written in py27; see GH#28645 and GH#31988
190+
return pc.load(f, encoding="latin-1")
187191
except UnicodeDecodeError:
188192
# e.g. can occur for files written in py27; see GH#28645
189193
return pc.load(f, encoding="latin-1")
1.36 KB
Binary file not shown.

pandas/tests/io/test_pickle.py

+12
Original file line numberDiff line numberDiff line change
@@ -501,3 +501,15 @@ def test_read_pickle_with_subclass():
501501

502502
tm.assert_series_equal(result[0], expected[0])
503503
assert isinstance(result[1], MyTz)
504+
505+
506+
def test_read_py27_pickle_with_MultiIndex_column(datapath):
507+
# pickle file with MultiIndex column written with py27
508+
# should be readable without raising UnicodeDecodeError
509+
# see GH#31988
510+
path = datapath("io", "data", "pickle", "test_mi_py27.pkl")
511+
df = pd.read_pickle(path)
512+
513+
# just test the columns are correct since the values are random
514+
expected = pd.MultiIndex.from_arrays([["a", "b", "c"], ["A", "B", "C"]])
515+
tm.assert_index_equal(df.columns, expected)

0 commit comments

Comments
 (0)