Skip to content

Removed Panel Kludge from Pickle/Msgpack tests #27082

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 14 commits into from
Jun 27, 2019
15 changes: 5 additions & 10 deletions doc/source/user_guide/io.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3319,16 +3319,7 @@ any pickled pandas object (or any other pickled object) from file:

.. warning::

Several internal refactoring have been done while still preserving
compatibility with pickles created with older versions of pandas. However,
for such cases, pickled ``DataFrames``, ``Series`` etc, must be read with
``pd.read_pickle``, rather than ``pickle.load``.

See `here <https://pandas.pydata.org/pandas-docs/stable/whatsnew.html#whatsnew-0130-refactoring>`__
and `here <https://pandas.pydata.org/pandas-docs/stable/whatsnew.html#whatsnew-0150-refactoring>`__
for some examples of compatibility-breaking changes. See
`this question <https://stackoverflow.com/questions/20444593/pandas-compiled-from-source-default-pickle-behavior-changed>`__
for a detailed explanation.
:func:`read_pickle` is only guaranteed backwards compatible back to pandas version 0.20.3

.. _io.pickle.compression:

Expand Down Expand Up @@ -3406,6 +3397,10 @@ both on the writing (serialization), and reading (deserialization).
optimizations in the io of the ``msgpack`` data. Since this is marked
as an EXPERIMENTAL LIBRARY, the storage format may not be stable until a future release.

.. warning::

:func:`read_msgpack` is only guaranteed backwards compatible back to pandas version 0.20.3

.. ipython:: python
df = pd.DataFrame(np.random.rand(5, 2), columns=list('AB'))
Expand Down
5 changes: 5 additions & 0 deletions doc/source/whatsnew/v0.25.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,11 @@ What's New in 0.25.0 (April XX, 2019)
`Panel` has been fully removed. For N-D labeled data structures, please
use `xarray <https://xarray.pydata.org/en/stable/>`_

.. warning::
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also add this note in io.rst (or similar, doesn't need the issue number)


:func:`read_pickle` and :func:`read_msgpack` are only guaranteed backwards compatible back to
pandas version 0.20.3 (:issue:`27082`)

{{ header }}

These are the changes in pandas 0.25.0. See :ref:`release` for a full changelog
Expand Down
5 changes: 5 additions & 0 deletions pandas/io/packers.py
Original file line number Diff line number Diff line change
Expand Up @@ -134,6 +134,11 @@ def read_msgpack(path_or_buf, encoding='utf-8', iterator=False, **kwargs):
Returns
-------
obj : same type as object stored in file
Notes
-----
read_msgpack is only guaranteed to be backwards compatible to pandas
0.20.3.
"""
path_or_buf, _, _, should_close = get_filepath_or_buffer(path_or_buf)
if iterator:
Expand Down
4 changes: 4 additions & 0 deletions pandas/io/pickle.py
Original file line number Diff line number Diff line change
Expand Up @@ -116,6 +116,10 @@ def read_pickle(path, compression='infer'):
read_sql : Read SQL query or database table into a DataFrame.
read_parquet : Load a parquet object, returning a DataFrame.
Notes
-----
read_pickle is only guaranteed to be backwards compatible to pandas 0.20.3.
Examples
--------
>>> original_df = pd.DataFrame({"foo": range(5), "bar": range(5, 10)})
Expand Down
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
45 changes: 11 additions & 34 deletions pandas/tests/io/generate_legacy_storage_files.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,21 +4,21 @@
self-contained to write legacy storage (pickle/msgpack) files
To use this script. Create an environment where you want
generate pickles, say its for 0.18.1, with your pandas clone
generate pickles, say its for 0.20.3, with your pandas clone
in ~/pandas
. activate pandas_0.18.1
. activate pandas_0.20.3
cd ~/
$ python pandas/pandas/tests/io/generate_legacy_storage_files.py \
pandas/pandas/tests/io/data/legacy_pickle/0.18.1/ pickle
This script generates a storage file for the current arch, system,
and python version
pandas version: 0.18.1
pandas version: 0.20.3
output dir : pandas/pandas/tests/io/data/legacy_pickle/0.18.1/
storage format: pickle
created pickle file: 0.18.1_x86_64_darwin_3.5.2.pickle
created pickle file: 0.20.3_x86_64_darwin_3.5.2.pickle
The idea here is you are using the *current* version of the
generate_legacy_storage_files with an *older* version of pandas to
Expand All @@ -45,7 +45,7 @@

import pandas
from pandas import (
Categorical, DataFrame, Index, MultiIndex, NaT, Period, Series,
Categorical, DataFrame, Index, MultiIndex, NaT, Period, RangeIndex, Series,
SparseDataFrame, SparseSeries, Timestamp, bdate_range, date_range,
period_range, timedelta_range, to_msgpack)

Expand Down Expand Up @@ -118,9 +118,7 @@ def create_data():
uint=Index(np.arange(10, dtype=np.uint64)),
timedelta=timedelta_range('00:00:00', freq='30T', periods=10))

if _loose_version >= LooseVersion('0.18'):
from pandas import RangeIndex
index['range'] = RangeIndex(10)
index['range'] = RangeIndex(10)

if _loose_version >= LooseVersion('0.21'):
from pandas import interval_range
Expand Down Expand Up @@ -191,14 +189,9 @@ def create_data():
nat=NaT,
tz=Timestamp('2011-01-01', tz='US/Eastern'))

if _loose_version < LooseVersion('0.19.2'):
timestamp['freq'] = Timestamp('2011-01-01', offset='D')
timestamp['both'] = Timestamp('2011-01-01', tz='Asia/Tokyo',
offset='M')
else:
timestamp['freq'] = Timestamp('2011-01-01', freq='D')
timestamp['both'] = Timestamp('2011-01-01', tz='Asia/Tokyo',
freq='M')
timestamp['freq'] = Timestamp('2011-01-01', freq='D')
timestamp['both'] = Timestamp('2011-01-01', tz='Asia/Tokyo',
freq='M')

off = {'DateOffset': DateOffset(years=1),
'DateOffset_h_ns': DateOffset(hour=6, nanoseconds=5824),
Expand Down Expand Up @@ -239,14 +232,6 @@ def create_data():
def create_pickle_data():
data = create_data()

# Pre-0.14.1 versions generated non-unpicklable mixed-type frames and
# panels if their columns/items were non-unique.
if _loose_version < LooseVersion('0.14.1'):
del data['frame']['mixed_dup']
del data['panel']['mixed_dup']
if _loose_version < LooseVersion('0.17.0'):
del data['series']['period']
del data['scalars']['period']
return data


Expand All @@ -256,14 +241,6 @@ def _u(x):

def create_msgpack_data():
data = create_data()
if _loose_version < LooseVersion('0.17.0'):
del data['frame']['mixed_dup']
del data['panel']['mixed_dup']
del data['frame']['dup']
del data['panel']['dup']
if _loose_version < LooseVersion('0.18.0'):
del data['series']['dt_tz']
del data['frame']['dt_mixed_tzs']
# Not supported
del data['sp_series']
del data['sp_frame']
Expand All @@ -272,7 +249,8 @@ def create_msgpack_data():
del data['frame']['cat_onecol']
del data['frame']['cat_and_float']
del data['scalars']['period']
if _loose_version < LooseVersion('0.23.0'):
if _loose_version >= LooseVersion('0.21') and (
_loose_version < LooseVersion('0.23.0')):
del data['index']['interval']
del data['offsets']
return _u(data)
Expand All @@ -285,7 +263,6 @@ def platform_name():

def write_legacy_pickles(output_dir):

# make sure we are < 0.13 compat (in py3)
version = pandas.__version__

print("This script generates a storage file for the current arch, system, "
Expand Down
40 changes: 3 additions & 37 deletions pandas/tests/io/test_packers.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,4 @@
import datetime
from distutils.version import LooseVersion
import glob
from io import BytesIO
import os
Expand Down Expand Up @@ -84,7 +83,6 @@ def check_arbitrary(a, b):
assert(a == b)


@pytest.mark.filterwarnings("ignore:\\nPanel:FutureWarning")
class TestPackers:

def setup_method(self, method):
Expand All @@ -99,7 +97,6 @@ def encode_decode(self, x, compress=None, **kwargs):
return read_msgpack(p, **kwargs)


@pytest.mark.filterwarnings("ignore:\\nPanel:FutureWarning")
class TestAPI(TestPackers):

def test_string_io(self):
Expand Down Expand Up @@ -463,7 +460,6 @@ def test_basic(self):
assert_categorical_equal(i, i_rec)


@pytest.mark.filterwarnings("ignore:\\nPanel:FutureWarning")
class TestNDFrame(TestPackers):

def setup_method(self, method):
Expand Down Expand Up @@ -842,7 +838,6 @@ def legacy_packer(request, datapath):
return datapath(request.param)


@pytest.mark.filterwarnings("ignore:\\nPanel:FutureWarning")
@pytest.mark.filterwarnings("ignore:Sparse:FutureWarning")
class TestMsgpack:
"""
Expand All @@ -858,31 +853,19 @@ class TestMsgpack:
minimum_structure = {'series': ['float', 'int', 'mixed',
'ts', 'mi', 'dup'],
'frame': ['float', 'int', 'mixed', 'mi'],
'panel': ['float'],
'index': ['int', 'date', 'period'],
'mi': ['reg2']}

def check_min_structure(self, data, version):
for typ, v in self.minimum_structure.items():
if typ == "panel":
# FIXME: kludge; get this key out of the legacy file
continue

assert typ in data, '"{0}" not found in unpacked data'.format(typ)
for kind in v:
msg = '"{0}" not found in data["{1}"]'.format(kind, typ)
assert kind in data[typ], msg

def compare(self, current_data, all_data, vf, version):
# GH12277 encoding default used to be latin-1, now utf-8
if LooseVersion(version) < LooseVersion('0.18.0'):
data = read_msgpack(vf, encoding='latin-1')
else:
data = read_msgpack(vf)

if "panel" in data:
# FIXME: kludge; get the key out of the stored file
del data["panel"]
data = read_msgpack(vf)

self.check_min_structure(data, version)
for typ, dv in data.items():
Expand All @@ -909,33 +892,16 @@ def compare(self, current_data, all_data, vf, version):
return data

def compare_series_dt_tz(self, result, expected, typ, version):
# 8260
# dtype is object < 0.17.0
if LooseVersion(version) < LooseVersion('0.17.0'):
expected = expected.astype(object)
tm.assert_series_equal(result, expected)
else:
tm.assert_series_equal(result, expected)
tm.assert_series_equal(result, expected)

def compare_frame_dt_mixed_tzs(self, result, expected, typ, version):
# 8260
# dtype is object < 0.17.0
if LooseVersion(version) < LooseVersion('0.17.0'):
expected = expected.astype(object)
tm.assert_frame_equal(result, expected)
else:
tm.assert_frame_equal(result, expected)
tm.assert_frame_equal(result, expected)

def test_msgpacks_legacy(self, current_packers_data, all_packers_data,
legacy_packer, datapath):

version = os.path.basename(os.path.dirname(legacy_packer))

# GH12142 0.17 files packed in P2 can't be read in P3
if (version.startswith('0.17.') and
legacy_packer.split('.')[-4][-1] == '2'):
msg = "Files packed in Py2 can't be read in Py3 ({})"
pytest.skip(msg.format(version))
try:
with catch_warnings(record=True):
self.compare(current_packers_data, all_packers_data,
Expand Down
62 changes: 7 additions & 55 deletions pandas/tests/io/test_pickle.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,6 @@
3. Move the created pickle to "data/legacy_pickle/<version>" directory.
"""
import bz2
from distutils.version import LooseVersion
import glob
import gzip
import lzma
Expand Down Expand Up @@ -69,18 +68,8 @@ def compare(data, vf, version):

m = globals()
for typ, dv in data.items():
if typ == "panel":
# FIXME: kludge; get this key out of the legacy file
continue

for dt, result in dv.items():
try:
expected = data[typ][dt]
except (KeyError):
if version in ('0.10.1', '0.11.0') and dt == 'reg':
break
else:
raise
expected = data[typ][dt]

# use a specific comparator
# if available
Expand All @@ -92,12 +81,7 @@ def compare(data, vf, version):


def compare_sp_series_ts(res, exp, typ, version):
# SparseTimeSeries integrated into SparseSeries in 0.12.0
# and deprecated in 0.17.0
if version and LooseVersion(version) <= LooseVersion("0.12.0"):
tm.assert_sp_series_equal(res, exp, check_series_type=False)
else:
tm.assert_sp_series_equal(res, exp)
tm.assert_sp_series_equal(res, exp)


def compare_series_ts(result, expected, typ, version):
Expand All @@ -121,47 +105,19 @@ def compare_series_ts(result, expected, typ, version):


def compare_series_dt_tz(result, expected, typ, version):
# 8260
# dtype is object < 0.17.0
if LooseVersion(version) < LooseVersion('0.17.0'):
expected = expected.astype(object)
tm.assert_series_equal(result, expected)
else:
tm.assert_series_equal(result, expected)
tm.assert_series_equal(result, expected)


def compare_series_cat(result, expected, typ, version):
# Categorical dtype is added in 0.15.0
# ordered is changed in 0.16.0
if LooseVersion(version) < LooseVersion('0.15.0'):
tm.assert_series_equal(result, expected, check_dtype=False,
check_categorical=False)
elif LooseVersion(version) < LooseVersion('0.16.0'):
tm.assert_series_equal(result, expected, check_categorical=False)
else:
tm.assert_series_equal(result, expected)
tm.assert_series_equal(result, expected)


def compare_frame_dt_mixed_tzs(result, expected, typ, version):
# 8260
# dtype is object < 0.17.0
if LooseVersion(version) < LooseVersion('0.17.0'):
expected = expected.astype(object)
tm.assert_frame_equal(result, expected)
else:
tm.assert_frame_equal(result, expected)
tm.assert_frame_equal(result, expected)


def compare_frame_cat_onecol(result, expected, typ, version):
# Categorical dtype is added in 0.15.0
# ordered is changed in 0.16.0
if LooseVersion(version) < LooseVersion('0.15.0'):
tm.assert_frame_equal(result, expected, check_dtype=False,
check_categorical=False)
elif LooseVersion(version) < LooseVersion('0.16.0'):
tm.assert_frame_equal(result, expected, check_categorical=False)
else:
tm.assert_frame_equal(result, expected)
tm.assert_frame_equal(result, expected)


def compare_frame_cat_and_float(result, expected, typ, version):
Expand All @@ -177,11 +133,7 @@ def compare_index_period(result, expected, typ, version):


def compare_sp_frame_float(result, expected, typ, version):
if LooseVersion(version) <= LooseVersion('0.18.1'):
tm.assert_sp_frame_equal(result, expected, exact_indices=False,
check_dtype=False)
else:
tm.assert_sp_frame_equal(result, expected)
tm.assert_sp_frame_equal(result, expected)


files = glob.glob(os.path.join(os.path.dirname(__file__), "data",
Expand Down