Skip to content

Removed Panel Kludge from Pickle/Msgpack tests #27082

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 14 commits into from
Jun 27, 2019
5 changes: 5 additions & 0 deletions doc/source/whatsnew/v0.25.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,11 @@ What's New in 0.25.0 (April XX, 2019)
`Panel` has been fully removed. For N-D labeled data structures, please
use `xarray <https://xarray.pydata.org/en/stable/>`_

.. warning::
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also add this note in io.rst (or similar, doesn't need the issue number)


``read_pickle`` and ``read_msgpack`` are only guaranteed backwards compatible back to
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add the :func:`read_pickle` and same for msgpack

pandas version 0.20.3
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add the issue number reference here


{{ header }}

These are the changes in pandas 0.25.0. See :ref:`release` for a full changelog
Expand Down
5 changes: 5 additions & 0 deletions pandas/io/packers.py
Original file line number Diff line number Diff line change
Expand Up @@ -134,6 +134,11 @@ def read_msgpack(path_or_buf, encoding='utf-8', iterator=False, **kwargs):
Returns
-------
obj : same type as object stored in file

Notes
-----
read_msgpack is only guaranteed to be backwards compatible to pandas
0.20.3.
"""
path_or_buf, _, _, should_close = get_filepath_or_buffer(path_or_buf)
if iterator:
Expand Down
4 changes: 4 additions & 0 deletions pandas/io/pickle.py
Original file line number Diff line number Diff line change
Expand Up @@ -116,6 +116,10 @@ def read_pickle(path, compression='infer'):
read_sql : Read SQL query or database table into a DataFrame.
read_parquet : Load a parquet object, returning a DataFrame.

Notes
-----
read_pickle is only guaranteed to be backwards compatible to pandas 0.20.3.

Examples
--------
>>> original_df = pd.DataFrame({"foo": range(5), "bar": range(5, 10)})
Expand Down
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
47 changes: 12 additions & 35 deletions pandas/tests/io/generate_legacy_storage_files.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,21 +4,21 @@
self-contained to write legacy storage (pickle/msgpack) files

To use this script. Create an environment where you want
generate pickles, say its for 0.18.1, with your pandas clone
generate pickles, say its for 0.20.3, with your pandas clone
in ~/pandas

. activate pandas_0.18.1
. activate pandas_0.20.3
cd ~/

$ python pandas/pandas/tests/io/generate_legacy_storage_files.py \
pandas/pandas/tests/io/data/legacy_pickle/0.18.1/ pickle

This script generates a storage file for the current arch, system,
and python version
pandas version: 0.18.1
pandas version: 0.20.3
output dir : pandas/pandas/tests/io/data/legacy_pickle/0.18.1/
storage format: pickle
created pickle file: 0.18.1_x86_64_darwin_3.5.2.pickle
created pickle file: 0.20.3_x86_64_darwin_3.5.2.pickle

The idea here is you are using the *current* version of the
generate_legacy_storage_files with an *older* version of pandas to
Expand All @@ -45,8 +45,8 @@

import pandas
from pandas import (
Categorical, DataFrame, Index, MultiIndex, NaT, Period, Series,
SparseDataFrame, SparseSeries, Timestamp, bdate_range, date_range,
Categorical, DataFrame, Index, MultiIndex, NaT, Period, RangeIndex,
Series, SparseDataFrame, SparseSeries, Timestamp, bdate_range, date_range,
period_range, timedelta_range, to_msgpack)

from pandas.tseries.offsets import (
Expand Down Expand Up @@ -118,9 +118,7 @@ def create_data():
uint=Index(np.arange(10, dtype=np.uint64)),
timedelta=timedelta_range('00:00:00', freq='30T', periods=10))

if _loose_version >= LooseVersion('0.18'):
from pandas import RangeIndex
index['range'] = RangeIndex(10)
index['range'] = RangeIndex(10)

if _loose_version >= LooseVersion('0.21'):
from pandas import interval_range
Expand Down Expand Up @@ -191,14 +189,9 @@ def create_data():
nat=NaT,
tz=Timestamp('2011-01-01', tz='US/Eastern'))

if _loose_version < LooseVersion('0.19.2'):
timestamp['freq'] = Timestamp('2011-01-01', offset='D')
timestamp['both'] = Timestamp('2011-01-01', tz='Asia/Tokyo',
offset='M')
else:
timestamp['freq'] = Timestamp('2011-01-01', freq='D')
timestamp['both'] = Timestamp('2011-01-01', tz='Asia/Tokyo',
freq='M')
timestamp['freq'] = Timestamp('2011-01-01', freq='D')
timestamp['both'] = Timestamp('2011-01-01', tz='Asia/Tokyo',
freq='M')

off = {'DateOffset': DateOffset(years=1),
'DateOffset_h_ns': DateOffset(hour=6, nanoseconds=5824),
Expand Down Expand Up @@ -239,14 +232,6 @@ def create_data():
def create_pickle_data():
data = create_data()

# Pre-0.14.1 versions generated non-unpicklable mixed-type frames and
# panels if their columns/items were non-unique.
if _loose_version < LooseVersion('0.14.1'):
del data['frame']['mixed_dup']
del data['panel']['mixed_dup']
if _loose_version < LooseVersion('0.17.0'):
del data['series']['period']
del data['scalars']['period']
return data


Expand All @@ -256,14 +241,6 @@ def _u(x):

def create_msgpack_data():
data = create_data()
if _loose_version < LooseVersion('0.17.0'):
del data['frame']['mixed_dup']
del data['panel']['mixed_dup']
del data['frame']['dup']
del data['panel']['dup']
if _loose_version < LooseVersion('0.18.0'):
del data['series']['dt_tz']
del data['frame']['dt_mixed_tzs']
# Not supported
del data['sp_series']
del data['sp_frame']
Expand All @@ -272,7 +249,8 @@ def create_msgpack_data():
del data['frame']['cat_onecol']
del data['frame']['cat_and_float']
del data['scalars']['period']
if _loose_version < LooseVersion('0.23.0'):
if _loose_version >= LooseVersion('0.21') and (
_loose_version < LooseVersion('0.23.0')):
del data['index']['interval']
del data['offsets']
return _u(data)
Expand All @@ -285,7 +263,6 @@ def platform_name():

def write_legacy_pickles(output_dir):

# make sure we are < 0.13 compat (in py3)
version = pandas.__version__

print("This script generates a storage file for the current arch, system, "
Expand Down
40 changes: 3 additions & 37 deletions pandas/tests/io/test_packers.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,4 @@
import datetime
from distutils.version import LooseVersion
import glob
from io import BytesIO
import os
Expand Down Expand Up @@ -84,7 +83,6 @@ def check_arbitrary(a, b):
assert(a == b)


@pytest.mark.filterwarnings("ignore:\\nPanel:FutureWarning")
class TestPackers:

def setup_method(self, method):
Expand All @@ -99,7 +97,6 @@ def encode_decode(self, x, compress=None, **kwargs):
return read_msgpack(p, **kwargs)


@pytest.mark.filterwarnings("ignore:\\nPanel:FutureWarning")
class TestAPI(TestPackers):

def test_string_io(self):
Expand Down Expand Up @@ -463,7 +460,6 @@ def test_basic(self):
assert_categorical_equal(i, i_rec)


@pytest.mark.filterwarnings("ignore:\\nPanel:FutureWarning")
class TestNDFrame(TestPackers):

def setup_method(self, method):
Expand Down Expand Up @@ -842,7 +838,6 @@ def legacy_packer(request, datapath):
return datapath(request.param)


@pytest.mark.filterwarnings("ignore:\\nPanel:FutureWarning")
@pytest.mark.filterwarnings("ignore:Sparse:FutureWarning")
class TestMsgpack:
"""
Expand All @@ -858,31 +853,19 @@ class TestMsgpack:
minimum_structure = {'series': ['float', 'int', 'mixed',
'ts', 'mi', 'dup'],
'frame': ['float', 'int', 'mixed', 'mi'],
'panel': ['float'],
'index': ['int', 'date', 'period'],
'mi': ['reg2']}

def check_min_structure(self, data, version):
for typ, v in self.minimum_structure.items():
if typ == "panel":
# FIXME: kludge; get this key out of the legacy file
continue

assert typ in data, '"{0}" not found in unpacked data'.format(typ)
for kind in v:
msg = '"{0}" not found in data["{1}"]'.format(kind, typ)
assert kind in data[typ], msg

def compare(self, current_data, all_data, vf, version):
# GH12277 encoding default used to be latin-1, now utf-8
if LooseVersion(version) < LooseVersion('0.18.0'):
data = read_msgpack(vf, encoding='latin-1')
else:
data = read_msgpack(vf)

if "panel" in data:
# FIXME: kludge; get the key out of the stored file
del data["panel"]
data = read_msgpack(vf)

self.check_min_structure(data, version)
for typ, dv in data.items():
Expand All @@ -909,33 +892,16 @@ def compare(self, current_data, all_data, vf, version):
return data

def compare_series_dt_tz(self, result, expected, typ, version):
# 8260
# dtype is object < 0.17.0
if LooseVersion(version) < LooseVersion('0.17.0'):
expected = expected.astype(object)
tm.assert_series_equal(result, expected)
else:
tm.assert_series_equal(result, expected)
tm.assert_series_equal(result, expected)

def compare_frame_dt_mixed_tzs(self, result, expected, typ, version):
# 8260
# dtype is object < 0.17.0
if LooseVersion(version) < LooseVersion('0.17.0'):
expected = expected.astype(object)
tm.assert_frame_equal(result, expected)
else:
tm.assert_frame_equal(result, expected)
tm.assert_frame_equal(result, expected)

def test_msgpacks_legacy(self, current_packers_data, all_packers_data,
legacy_packer, datapath):

version = os.path.basename(os.path.dirname(legacy_packer))

# GH12142 0.17 files packed in P2 can't be read in P3
if (version.startswith('0.17.') and
legacy_packer.split('.')[-4][-1] == '2'):
msg = "Files packed in Py2 can't be read in Py3 ({})"
pytest.skip(msg.format(version))
try:
with catch_warnings(record=True):
self.compare(current_packers_data, all_packers_data,
Expand Down
62 changes: 7 additions & 55 deletions pandas/tests/io/test_pickle.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,6 @@
3. Move the created pickle to "data/legacy_pickle/<version>" directory.
"""
import bz2
from distutils.version import LooseVersion
import glob
import gzip
import lzma
Expand Down Expand Up @@ -69,18 +68,8 @@ def compare(data, vf, version):

m = globals()
for typ, dv in data.items():
if typ == "panel":
# FIXME: kludge; get this key out of the legacy file
continue

for dt, result in dv.items():
try:
expected = data[typ][dt]
except (KeyError):
if version in ('0.10.1', '0.11.0') and dt == 'reg':
break
else:
raise
expected = data[typ][dt]

# use a specific comparator
# if available
Expand All @@ -92,12 +81,7 @@ def compare(data, vf, version):


def compare_sp_series_ts(res, exp, typ, version):
# SparseTimeSeries integrated into SparseSeries in 0.12.0
# and deprecated in 0.17.0
if version and LooseVersion(version) <= LooseVersion("0.12.0"):
tm.assert_sp_series_equal(res, exp, check_series_type=False)
else:
tm.assert_sp_series_equal(res, exp)
tm.assert_sp_series_equal(res, exp)


def compare_series_ts(result, expected, typ, version):
Expand All @@ -121,47 +105,19 @@ def compare_series_ts(result, expected, typ, version):


def compare_series_dt_tz(result, expected, typ, version):
# 8260
# dtype is object < 0.17.0
if LooseVersion(version) < LooseVersion('0.17.0'):
expected = expected.astype(object)
tm.assert_series_equal(result, expected)
else:
tm.assert_series_equal(result, expected)
tm.assert_series_equal(result, expected)


def compare_series_cat(result, expected, typ, version):
# Categorical dtype is added in 0.15.0
# ordered is changed in 0.16.0
if LooseVersion(version) < LooseVersion('0.15.0'):
tm.assert_series_equal(result, expected, check_dtype=False,
check_categorical=False)
elif LooseVersion(version) < LooseVersion('0.16.0'):
tm.assert_series_equal(result, expected, check_categorical=False)
else:
tm.assert_series_equal(result, expected)
tm.assert_series_equal(result, expected)


def compare_frame_dt_mixed_tzs(result, expected, typ, version):
# 8260
# dtype is object < 0.17.0
if LooseVersion(version) < LooseVersion('0.17.0'):
expected = expected.astype(object)
tm.assert_frame_equal(result, expected)
else:
tm.assert_frame_equal(result, expected)
tm.assert_frame_equal(result, expected)


def compare_frame_cat_onecol(result, expected, typ, version):
# Categorical dtype is added in 0.15.0
# ordered is changed in 0.16.0
if LooseVersion(version) < LooseVersion('0.15.0'):
tm.assert_frame_equal(result, expected, check_dtype=False,
check_categorical=False)
elif LooseVersion(version) < LooseVersion('0.16.0'):
tm.assert_frame_equal(result, expected, check_categorical=False)
else:
tm.assert_frame_equal(result, expected)
tm.assert_frame_equal(result, expected)


def compare_frame_cat_and_float(result, expected, typ, version):
Expand All @@ -177,11 +133,7 @@ def compare_index_period(result, expected, typ, version):


def compare_sp_frame_float(result, expected, typ, version):
if LooseVersion(version) <= LooseVersion('0.18.1'):
tm.assert_sp_frame_equal(result, expected, exact_indices=False,
check_dtype=False)
else:
tm.assert_sp_frame_equal(result, expected)
tm.assert_sp_frame_equal(result, expected)


files = glob.glob(os.path.join(os.path.dirname(__file__), "data",
Expand Down