Skip to content

BUG: pickle incompatibility with pandas-2.0 from pandas-1.5 with numeric index #53300

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
3 tasks done
mwtoews opened this issue May 19, 2023 · 11 comments
Closed
3 tasks done
Labels
Bug IO Pickle read_pickle, to_pickle Needs Triage Issue that has not been reviewed by a pandas team member

Comments

@mwtoews
Copy link
Contributor

mwtoews commented May 19, 2023

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

Create a simple series with a numeric index with pandas-1.5.3:

import pandas as pd
import pickle

s = pd.Series([3.3, 4.4], index=[2, 4])
with open('s-1.5.3.pkl', 'wb') as f:
    pickle.dump(s, f)

Or get it here: s-1.5.3.pkl.gz

then try to unpickle it with pandas-2.0:

import pickle

with open('s-1.5.3.pkl', 'rb') as f:
    s2 = pickle.load(f)

raises:

Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
ModuleNotFoundError: No module named 'pandas.core.indexes.numeric'

Issue Description

Some pickle objects created with pandas<2 cannot be loaded with pandas-2.0, since there is no longer a pandas.core.indexes.numeric module (removed with #51139).

See also Stack Overflow Q/A for this issue (not mine).

Expected Behavior

There is a pandas.compat.pickle_compat module that could potentially be used to help this scenario and read the pickle? Curiously enough, it references pandas.core.indexes.numeric which was removed for 2.0.

Installed Versions

>>> pd.show_versions()
/tmp/py310/lib/python3.10/site-packages/_distutils_hack/__init__.py:33: UserWarning: Setuptools is replacing distutils.
  warnings.warn("Setuptools is replacing distutils.")

INSTALLED VERSIONS
------------------
commit           : 37ea63d540fd27274cad6585082c91b1283f963d
python           : 3.10.11.final.0
python-bits      : 64
OS               : Linux
OS-release       : 5.4.0-148-generic
Version          : #165-Ubuntu SMP Tue Apr 18 08:53:12 UTC 2023
machine          : x86_64
processor        : x86_64
byteorder        : little
LC_ALL           : None
LANG             : en_AU.UTF-8
LOCALE           : en_AU.UTF-8

pandas           : 2.0.1
numpy            : 1.24.3
pytz             : 2023.3
dateutil         : 2.8.2
setuptools       : 65.5.0
pip              : 23.1.2
Cython           : None
pytest           : None
hypothesis       : None
sphinx           : None
blosc            : None
feather          : None
xlsxwriter       : None
lxml.etree       : None
html5lib         : None
pymysql          : None
psycopg2         : None
jinja2           : None
IPython          : 8.13.2
pandas_datareader: None
bs4              : None
bottleneck       : None
brotli           : None
fastparquet      : None
fsspec           : None
gcsfs            : None
matplotlib       : None
numba            : None
numexpr          : None
odfpy            : None
openpyxl         : None
pandas_gbq       : None
pyarrow          : None
pyreadstat       : None
pyxlsb           : None
s3fs             : None
scipy            : None
snappy           : None
sqlalchemy       : None
tables           : None
tabulate         : None
xarray           : None
xlrd             : None
zstandard        : None
tzdata           : 2023.3
qtpy             : None
pyqt5            : None
@mwtoews mwtoews added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels May 19, 2023
@mwtoews
Copy link
Contributor Author

mwtoews commented May 19, 2023

Possibly a hack, but a workaround is to re-create pandas/core/indexes/numeric.py

from pandas.core.indexes.base import Index

class NumericIndex(Index):
    pass

class IntegerIndex(NumericIndex):
    pass

class Int64Index(IntegerIndex):
    pass

class UInt64Index(IntegerIndex):
    pass

class Float64Index(NumericIndex):
    pass

resolves the issue, and print(s2.index) shows Index([2, 4], dtype='int64')

@topper-123
Copy link
Contributor

We support pickling through the to_pickle method. Do you still get the error when doing it like below instead:

s = pd.Series([3.3, 4.4], index=[2, 4])
with open('s-1.5.3.pkl', 'wb') as f:
    s.to_pickle(f)

?

@topper-123 topper-123 added the IO Pickle read_pickle, to_pickle label May 19, 2023
@bashtage
Copy link
Contributor

Is pickle even supposed to support cross-major.minor compatibility?

@mwtoews
Copy link
Contributor Author

mwtoews commented May 19, 2023

We support pickling through the to_pickle method. Do you still get the error when doing it like below instead:

Looking at the binary structure from to_pickle from a previous pandas version looks the same (it is still referencing private API pandas.core.indexes.numeric), so has the same issue as the file written from the pickle module.

But, this reminds me that pandas.read_pickle exists, and this works effortlessly with the latest pandas to read older pandas pickles (written any way):

print(pd.__version__)  # 2.0.1
print(pd.read_pickle("/tmp/s-1.5.3.pkl"))
# 2    3.3
# 4    4.4
# dtype: float64

the doc also neatly answers the compatibility question:

read_pickle is only guaranteed to be backwards compatible to pandas 0.20.3 provided the object was serialized with to_pickle.

So yes, I'd expect pandas.read_pickle to read some older pickled objects, e.g. between versions 2.x and 1.x, but not earlier than 0.20.3.

@bashtage
Copy link
Contributor

So yes, I'd expect pandas.read_pickle to read some older pickled objects, e.g. between versions 2.x and 1.x, but not earlier than 0.20.3.

From the code above, it appears to work. If so, can you close this?

@mwtoews mwtoews closed this as completed May 19, 2023
@mwtoews
Copy link
Contributor Author

mwtoews commented May 22, 2023

Re-opening, because the issue actually manifested from a custom class with DataFrame attributes, which cannot use read_pickle/to_pickle methods.

Here is a revised example, first define a custom class with myclass.py:

class MyClass:
    def __init__(self, thing, df):
        self.thing = thing
        self.df = df
    def __iter__(self):
        yield "thing", self.thing
        yield "df", self.df

then using pandas-1.5.3:

import pandas as pd
import pickle
from myclass import MyClass

mc = MyClass(1.23, pd.DataFrame({"s": [3.3, 4.4]}, index=[2, 4]))

with open("mc-1.5.3.pkl", "wb") as f:
    pickle.dump(mc, f)

with open("mc-1.5.3.pkl", "rb") as f:
    mc2 = pickle.load(f)

using pickle.load(f) works fine with pandas<2, but will raise the same ModuleNotFoundError as before using pandas>=2.

The only solution that I see is to use the private API, here with pandas-2.0.1:

import pandas as pd
from myclass import MyClass

with open("mc-1.5.3.pkl", "rb") as f:
    mc2 = pd.compat.pickle_compat.load(f)
dict(mc2)
# {'thing': 1.23,
#  'df':      s
#  2  3.3
#  4  4.4}

Is there any other way to unpickle a custom class using the public API?

@mwtoews mwtoews reopened this May 22, 2023
@mwtoews
Copy link
Contributor Author

mwtoews commented May 22, 2023

whoops, my bad! It seems the public API works fine with this one too:

import pandas as pd
from myclass import MyClass

mc2 = pd.read_pickle("mc-1.5.3.pkl")
dict(mc2)
# {'thing': 1.23,
#  'df':      s
#  2  3.3
#  4  4.4}

the pandas.read_pickle docstring has a useful summary "Load pickled pandas object (or any object) from file."

@mwtoews mwtoews closed this as completed May 22, 2023
@topper-123
Copy link
Contributor

Ok, great.

We only support reading/writing pickles through the read_pickle/to_pickle interface . We need an interface, because pickles are fickle...

@MichalRIcar
Copy link

MichalRIcar commented Jul 28, 2023

Hi,

as this is pretty new topic and can be a crucial point for guys who might have significant amount of data stored in a form of pickle → pandas, and to help google providing a match, I would like to highlight that only solution for my data lib migrating from pandas < 2 to pandas >= 2 was

pd.compat.pickle_compat.load(X)

Other discussions and stackoverflow offer only "pd.read_pickle(X)" which didn't work in my case.

Best,
Michal

@behrenhoff
Copy link
Contributor

I have written a substantial amount of files using joblib.dump({"dictionary_with_dfs": df}, "file.pkl") - any idea how to deal with those? pd.read_pickle does not work, and joblib.load(...) runs into the pandas.core.indexes.numeric issue.

Is this for pandas or joblib to solve? Or do I have to walk my directories, load all files with joblib and dump them back with pickle? I was using joblib mainly for convenience reasons.

@brobr
Copy link

brobr commented Sep 28, 2023

Hi, the same problem occurs with pickle files that have been created with python.shelve. This module was used to store fairly complex datastructures (sets of dictionaries with named dataframes).

These files are opened by: with shelve.open(str(file_to_load)) as db:
After upgrading from pandas-1.5 to 2.1 the error is raised:

 File "/usr/lib64/python3.9/shelve.py", line 114, in __getitem__
      value = Unpickler(f).load()
 ModuleNotFoundError: No module named 'pandas.core.indexes.numeric'

The proposed with pd.read_pickle(str(file_to_load)) as db: gives

 File "/usr/lib64/python3.9/site-packages/pandas/io/pickle.py", line 206, in read_pickle
 return pickle.load(handles.handle)
 _pickle.UnpicklingError: invalid load key, '\xcf'.

The alternative with pd.compat.pickle_compat.load(str(file_to_load)) as db:

 File "/usr/lib64/python3.9/site-packages/pandas/compat/pickle_compat.py", line 217, in load
 fh.seek(0)
 AttributeError: 'str' object has no attribute 'seek'

The hack above, creating a file numeric.py and placing a symlink to that in /usr/lib64/python3.9/site-packages/pandas/core/indexes/, saved the old behaviour.

penguinpee added a commit to penguinpee/AstroVascPy that referenced this issue May 24, 2024
Going by pandas-dev/pandas#53300 this should
be backwards compatible with older versions of pandas down to 0.20.3.
tristan0x pushed a commit to BlueBrain/AstroVascPy that referenced this issue May 24, 2024
Going by pandas-dev/pandas#53300 this should
be backwards compatible with older versions of pandas down to 0.20.3.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug IO Pickle read_pickle, to_pickle Needs Triage Issue that has not been reviewed by a pandas team member
Projects
None yet
Development

No branches or pull requests

6 participants