Cannot unpickle data frame made with 0.19.2 after upgrade to 0.20.1 #16474

mhooreman · 2017-05-24T07:58:24Z

Hello,

Problem description

When we create a data frame with pandas ≤ 0.19.2 and pickle it (using pickle.dump), it is not possible to unpickle it using pandas 0.20.1.

# Using pandas 0.19.2
import pandas as pd
import pickle as pkl
data = pd.DataFrame({'x': [1, 2]})
pkl.dump(data, open("data_pd_0.19.2.pkl", "wb"))

# After upgrade to pandas 0.20.1
import pandas as pd
import pickle as pkl
pkl.load(open("data_pd_0.19.2.pkl", "rb"))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ModuleNotFoundError: No module named 'pandas.indexes'

First analysis

It seems that pandas.indexes has been refactored to pandas.core.indexes.
I don't know if there are other such incompatible changes

Proposal

It would be great to have:

A deprecation warning when unpicking old data frame
Load old data frame supported but automatically converted to the new format, so that we can upgrade by pickling the unpickled data frames

Thanks a lot for your help,
Best regards.

The text was updated successfully, but these errors were encountered:

jreback · 2017-05-24T09:58:56Z

Big red box, is clear that pd.read_pickle is the pickle reader and makes things backward compatible. Further whatsnew notes have a quite large section on what changed here

sure a direct call will work to pickle.loads, but this is not guaranteeed across versions.

matjazk · 2017-06-01T17:38:12Z

Going from panda 0.18.1 to 0.20.1 I encountered the same problem when loading with joblib. joblib.load fails with exactly the same error:
ImportError: No module named 'pandas.indexes'

When you fix this (see the first workaround), there is an error
AttributeError: module 'pandas.core.base' has no attribute 'FrozenNDArray'

After workaround 2, files load. It seems that in my case this is more of a question for joblib devs.

Two (ugly) workarounds:

import sys
# 1
import pandas.core.indexes 
sys.modules['pandas.indexes'] = pandas.core.indexes
# 2
import pandas.core.base, pandas.core.indexes.frozen
setattr(sys.modules['pandas.core.base'],'FrozenNDArray', pandas.core.indexes.frozen.FrozenNDArray)

jreback · 2017-06-01T17:48:05Z

see the above and simply use pd.read_pickle

matjazk · 2017-06-01T19:41:42Z

I would if I could. But... I have a complex class (consisting of numpy objects, pandas series and dataframes, dictionaries ...), stored in a compressed joblib archive, so pd.read_pickle is of no use to me. As I said, this might be useful for joblib developers as for now it is impossible to load any joblib archive created when pandas < 0.20. I first had to downgrade pandas and now I'm using the above workarounds.

jorisvandenbossche · 2017-06-01T20:32:02Z

@matjazk Would you like to open an issue at joblib for this?

matjazk · 2017-06-01T20:46:43Z

Already did and passed @jreback's suggestion.

mhooreman · 2017-06-08T14:25:21Z

Thanks. pd.read_pickle works, but, for your information, it is extremely slow - see benchmark.
I've made a script to pd.read_pickle and then pd.to_pickle every file.

jorisvandenbossche · 2017-06-08T15:50:17Z

@mhooreman the timings of "reading old" look suspiciously consistent with "writing". Are you sure you timed the correct thing?

mhooreman · 2017-06-08T17:30:21Z

I need to double check, but I'm sure about the performance difference while reading: I got performance issues and I converted to fix those. Le 8 juin 2017 5:50 PM, "Joris Van den Bossche" <[email protected]> a écrit :

…

@mhooreman <https://github.com/mhooreman> the timings of "reading old" look suspiciously consistent with "writing". Are you sure you timed the correct thing? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#16474 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AHCDlsDJNzuoFr8GMwsOIyQwCFGjUEZKks5sCBhTgaJpZM4Nku1h> .

jreback · 2017-06-08T19:06:22Z

@mhooreman of course its slower. its falling back to the python based unpickler which is much more flexible. so you can either have fast or correctness. you get to choose.

TheodoreZhao · 2017-07-02T05:36:27Z

I got the same problem when unpickling the data in pandas 0.20.2. I have used df.to_pickle() to pickle my dataframe in pandas 0.19.2 but failed to unpickle it using pandas.read_pickle() in pandas 0.20.2. I got the error message

ImportError: No module named 'pandas.indexes'

pandas.read_pickle() and pickle.load() both generate this error message.

jorisvandenbossche · 2017-08-22T08:42:11Z

@TheodoreZhao If you have this error with read_pickle as well, please open a new issue with a reproducible example.

seandickert · 2018-02-21T06:22:59Z

@jreback similar to @matjazk, pd.read_pickle doesn't work if you're using pickle.loads to load a string (retrieved from some store other than the filesystem). Can pd.read_pickle be updated to handle a file-like object rather than just a path?

jreback · 2018-02-21T11:10:11Z

its an open issue: #5924

if you want to submit a PR to do this, its not difficult.

jreback closed this as completed May 24, 2017

jreback added IO Data IO issues that don't fit into a more specific label Usage Question labels May 24, 2017

jreback added this to the No action milestone May 24, 2017

jorisvandenbossche mentioned this issue Aug 22, 2017

Regression : Pandas 0.20.1 vs 0.19.1 #17306

Closed

vsomnath mentioned this issue Feb 19, 2019

Tox21 tutorial not working with standard conda installation deepchem/deepchem#1510

Closed

zzuziak mentioned this issue Jul 27, 2021

replace method for opening pickle files lewagon/nbresult#3

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cannot unpickle data frame made with 0.19.2 after upgrade to 0.20.1 #16474

Cannot unpickle data frame made with 0.19.2 after upgrade to 0.20.1 #16474

mhooreman commented May 24, 2017 •

edited

Loading

jreback commented May 24, 2017

matjazk commented Jun 1, 2017

jreback commented Jun 1, 2017

matjazk commented Jun 1, 2017

jorisvandenbossche commented Jun 1, 2017

matjazk commented Jun 1, 2017

mhooreman commented Jun 8, 2017

jorisvandenbossche commented Jun 8, 2017

mhooreman commented Jun 8, 2017 via email

jreback commented Jun 8, 2017

TheodoreZhao commented Jul 2, 2017 •

edited

Loading

jorisvandenbossche commented Aug 22, 2017

seandickert commented Feb 21, 2018

jreback commented Feb 21, 2018

Cannot unpickle data frame made with 0.19.2 after upgrade to 0.20.1 #16474

Cannot unpickle data frame made with 0.19.2 after upgrade to 0.20.1 #16474

Comments

mhooreman commented May 24, 2017 • edited Loading

Problem description

First analysis

Proposal

jreback commented May 24, 2017

matjazk commented Jun 1, 2017

jreback commented Jun 1, 2017

matjazk commented Jun 1, 2017

jorisvandenbossche commented Jun 1, 2017

matjazk commented Jun 1, 2017

mhooreman commented Jun 8, 2017

jorisvandenbossche commented Jun 8, 2017

mhooreman commented Jun 8, 2017 via email

jreback commented Jun 8, 2017

TheodoreZhao commented Jul 2, 2017 • edited Loading

jorisvandenbossche commented Aug 22, 2017

seandickert commented Feb 21, 2018

jreback commented Feb 21, 2018

mhooreman commented May 24, 2017 •

edited

Loading

TheodoreZhao commented Jul 2, 2017 •

edited

Loading