Skip to content

COMPAT: remove Categorical pickle compat with Pandas < 0.16 #27538

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Sep 19, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions doc/source/whatsnew/v1.0.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -99,6 +99,8 @@ Removal of prior version deprecations/changes
- :meth:`pandas.Series.str.cat` does not accept list-likes *within* list-likes anymore (:issue:`27611`)
- Removed the previously deprecated :meth:`ExtensionArray._formatting_values`. Use :attr:`ExtensionArray._formatter` instead. (:issue:`23601`)
- Removed the previously deprecated ``IntervalIndex.from_intervals`` in favor of the :class:`IntervalIndex` constructor (:issue:`19263`)
- Ability to read pickles containing :class:`Categorical` instances created with pre-0.16 version of pandas has been removed (:issue:`27538`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The whatsnew for v0.25.0 already states that we dropped pickle support prior to 0.20.3 so this isn't necessary, though I guess doesn't hurt to include

Copy link
Contributor Author

@topper-123 topper-123 Aug 28, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO it is ok to include this, given that something won't work that previously did work...

-

.. _whatsnew_1000.performance:

Expand Down
19 changes: 1 addition & 18 deletions pandas/core/arrays/categorical.py
Original file line number Diff line number Diff line change
Expand Up @@ -1350,24 +1350,7 @@ def __setstate__(self, state):
if not isinstance(state, dict):
raise Exception("invalid pickle state")

# Provide compatibility with pre-0.15.0 Categoricals.
if "_categories" not in state and "_levels" in state:
state["_categories"] = self.dtype.validate_categories(state.pop("_levels"))
if "_codes" not in state and "labels" in state:
state["_codes"] = coerce_indexer_dtype(
state.pop("labels"), state["_categories"]
)

# 0.16.0 ordered change
if "_ordered" not in state:

# >=15.0 < 0.16.0
if "ordered" in state:
state["_ordered"] = state.pop("ordered")
else:
state["_ordered"] = False

# 0.21.0 CategoricalDtype change
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is there a 025 pickle
file included? this is covered by the legacy pickle dukes no?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test test_read_fspath_all tests if reading custom file paths (CustomFS) returns an expected value. Seems reasonable that a pickle are included for that testing. Adding 0.25 in the name is not strictly necessary, but can possibly be handy info to have.

Probably this test could instead test if temp files are read correctly back in, but that would be for another PR, IMO.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we already have round trip categorical pickle tests

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't mean test round-tripping, but use roundtripping in this test, rather than keeping a file in /data.

# compat with pre 0.21.0 CategoricalDtype change
if "_dtype" not in state:
state["_dtype"] = CategoricalDtype(state["_categories"], state["_ordered"])

Expand Down
Binary file added pandas/tests/io/data/categorical.0.25.0.pickle
Binary file not shown.
94 changes: 0 additions & 94 deletions pandas/tests/io/data/categorical_0_14_1.pickle

This file was deleted.

Binary file removed pandas/tests/io/data/categorical_0_15_2.pickle
Binary file not shown.
2 changes: 1 addition & 1 deletion pandas/tests/io/test_common.py
Original file line number Diff line number Diff line change
Expand Up @@ -222,7 +222,7 @@ def test_read_expands_user_home_dir(
(pd.read_sas, "os", ("io", "sas", "data", "test1.sas7bdat")),
(pd.read_json, "os", ("io", "json", "data", "tsframe_v012.json")),
(pd.read_msgpack, "os", ("io", "msgpack", "data", "frame.mp")),
(pd.read_pickle, "os", ("io", "data", "categorical_0_14_1.pickle")),
(pd.read_pickle, "os", ("io", "data", "categorical.0.25.0.pickle")),
],
)
def test_read_fspath_all(self, reader, module, path, datapath):
Expand Down
32 changes: 0 additions & 32 deletions pandas/tests/io/test_pickle.py
Original file line number Diff line number Diff line change
Expand Up @@ -194,38 +194,6 @@ def python_unpickler(path):
compare_element(result, expected, typ)


def test_pickle_v0_14_1(datapath):

cat = pd.Categorical(
values=["a", "b", "c"], ordered=False, categories=["a", "b", "c", "d"]
)
pickle_path = datapath("io", "data", "categorical_0_14_1.pickle")
# This code was executed once on v0.14.1 to generate the pickle:
#
# cat = Categorical(labels=np.arange(3), levels=['a', 'b', 'c', 'd'],
# name='foobar')
# with open(pickle_path, 'wb') as f: pickle.dump(cat, f)
#
tm.assert_categorical_equal(cat, pd.read_pickle(pickle_path))


def test_pickle_v0_15_2(datapath):
# ordered -> _ordered
# GH 9347

cat = pd.Categorical(
values=["a", "b", "c"], ordered=False, categories=["a", "b", "c", "d"]
)
pickle_path = datapath("io", "data", "categorical_0_15_2.pickle")
# This code was executed once on v0.15.2 to generate the pickle:
#
# cat = Categorical(labels=np.arange(3), levels=['a', 'b', 'c', 'd'],
# name='foobar')
# with open(pickle_path, 'wb') as f: pickle.dump(cat, f)
#
tm.assert_categorical_equal(cat, pd.read_pickle(pickle_path))


def test_pickle_path_pathlib():
df = tm.makeDataFrame()
result = tm.round_trip_pathlib(df.to_pickle, pd.read_pickle)
Expand Down