Skip to content

read_pickle error for multi-index: 'FrozenList' does not support mutable operations. #4788

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wcbeard opened this issue Sep 9, 2013 · 24 comments · Fixed by #4791
Closed
Labels
Bug Internals Related to non-user accessible pandas implementation
Milestone

Comments

@wcbeard
Copy link
Contributor

wcbeard commented Sep 9, 2013

I'm in a bit of a pickle here. If I try to save and read back a multi-indexed dataframe, I get this error (and in some situations, can't reproduce when, I get a TypeError: Required argument 'shape' (pos 1) not found error).

The gist with the full traceback is here.

In [3]: import numpy as np

In [4]: import pandas as pd

In [5]: np.random.seed(10)

In [6]: a = np.random.randint(0, 20, (6, 5))

In [7]: df = pd.DataFrame(a).set_index([2,3,4])

In [8]: df
Out[8]:
           0   1
2  3  4
15 0  17   9   4
8  9  0   16  17
4  19 16  10   8
11 11 1    4  15
14 17 19   8   4
13 19 13  13   5

In [9]: df.to_pickle('~/Desktop/dummy.df')

In [10]: df2 = pd.read_pickle('~/Desktop/dummy.df')
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
...
TypeError: 'FrozenList' does not support mutable operations.

When I reset_index it saves fine. I'm on Mac OSX, numpy 1.7, pandas 0.12.0-361-g53eec08.

In the meantime, if there's no quick fix, anyone know another way to save multi-indexes to file? csv doesn't look like it can preserve them. I could do something like appending _index to the columns and un-rename them after reading but would prefer something less hacky.

@jreback
Copy link
Contributor

jreback commented Sep 9, 2013

try with current master; this fixed the pickle issue: 725b195

@jreback
Copy link
Contributor

jreback commented Sep 9, 2013

you can save mi for both index/colmns via csv since 0.12, see here: http://pandas.pydata.org/pandas-docs/dev/io.html#reading-columns-with-a-multiindex

here for just mi on the index (ths has been for a while): http://pandas.pydata.org/pandas-docs/dev/io.html#reading-an-index-with-a-multiindex

@wcbeard
Copy link
Contributor Author

wcbeard commented Sep 9, 2013

pd.version.version tells me I'm on 0.12.0-361-g53eec08. Is that not the most recent commit? Is it possible it's really a different version installed? (I git pull'd today and did pip uninstall pandas + pip install . in the local repo).

@jtratner
Copy link
Contributor

jtratner commented Sep 9, 2013

That said, I would not be surprised if the new MultiIndex setup needed to define some of the pickle magic methods, since we changed up its internal representation. Are there tests for roundtripping MI through pickle?

@jreback
Copy link
Contributor

jreback commented Sep 9, 2013

@jtratner should be from 0.10 on....I saved a pickled version from each (and 0.13)....

@d10genes do you see that commit I referenced (just git log and searchfor pickle); i think it was 3-4 days ago

@d10genes that error is a fall thru; I think there is another error

can you debug thru pdb and step thru the read_piickle ?

basically it tries the original pickle, then the fallback, then a version with an encoding, then fallback with encoding

are you using 2.7? (or a 3x) python?

@jreback
Copy link
Contributor

jreback commented Sep 9, 2013

@d10genes can you post a sample code?

@wcbeard
Copy link
Contributor Author

wcbeard commented Sep 9, 2013

@jreback This shows up in git log

commit 725b1951249a795fe01896dff4ce46bd9206021f
Merge: 2267fe4 0436809
Author: jreback <[email protected]>
Date:   Fri Sep 6 16:06:27 2013 -0700

    Merge pull request #4755 from jreback/pickle_compat

    BUG: TimeSeries compat from < 0.13

@wcbeard
Copy link
Contributor Author

wcbeard commented Sep 9, 2013

As far as sample code, how do you want it different from the OP?

@jreback
Copy link
Contributor

jreback commented Sep 9, 2013

@d10genes sorry...you put it up already....hold on

@jreback
Copy link
Contributor

jreback commented Sep 9, 2013

I have a test for a mi and a frame with several kinds of index...but of course not a mi....let me fix.

thanks for the report! (some of this pickle code was pretty tricky and was trying to cover all the bases!)

@wcbeard
Copy link
Contributor Author

wcbeard commented Sep 9, 2013

Ok, thanks. And out of curiosity, would the underlying code for pickling treat Series and DataFrames differently? Going off of my above code, it looks like series MI pickling sets off that other error:

In [13]: s = df[0]

In [14]: s
Out[14]:
2   3   4
15  0   17     9
8   9   0     16
4   19  16    10
11  11  1      4
14  17  19     8
13  19  13    13
Name: 0, dtype: int64
In [16]: s.to_pickle('~/Desktop/s.df')
In [17]: s2 = pd.read_pickle('~/Desktop/s.df')
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
...
TypeError: Required argument 'shape' (pos 1) not found

@jreback
Copy link
Contributor

jreback commented Sep 9, 2013

no...the problem is assigning the index to the pandas object, same code for all (but the deserialization of the MI is not 'defined' correctly)

@jtratner
Copy link
Contributor

jtratner commented Sep 9, 2013

@jreback if you want me to look at it, I can... I have a hunch. (but you
usually figure these things out quickly anyways :))

@jreback
Copy link
Contributor

jreback commented Sep 9, 2013

i have the test case set and will look in a few

@jreback
Copy link
Contributor

jreback commented Sep 9, 2013

@jtratner

ok..this branch, last commti shows the test case: https://github.com/jreback/pandas/tree/pickle_fix

a bunch of things are commented out...but this will fail on a disabled method in FrozenList

pickle is trying to extend the list....

a possible solution is to have a context manager with certain classes (or can just do it with a try:except:finally (e.g. FrozenList), which reenables all/certain methods and then redisables them.....

@jtratner
Copy link
Contributor

jtratner commented Sep 9, 2013

@jreback yeah, I was noticing that... would it work to just define __reduce__() like this?

def __reduce__(self):
    return self.__class__, (list(self),)

@jtratner
Copy link
Contributor

jtratner commented Sep 9, 2013

btw - how come pickle.loads(pickle.dumps(df)) works, but not round tripping with read_pickle and to_pickle?

@jtratner
Copy link
Contributor

jtratner commented Sep 9, 2013

@jreback yep, that resolves it - much simpler.

@jtratner
Copy link
Contributor

jtratner commented Sep 9, 2013

only issue is whether we have to support legacy pickles from the time between the FrozenList addition and now.

@wcbeard
Copy link
Contributor Author

wcbeard commented Sep 10, 2013

@jreback I can step through tomorrow morning if it'd still be helpful. And I'm using 2.7

@jreback
Copy link
Contributor

jreback commented Sep 10, 2013

@d10genes thanks
I believe the PR #4791 fixes this
certain properties of the multi index were changes for 0.13 to make is immutable (was supposed to be before but it could be changed) and the resulting object didn't pickle properly

that said glad you caught this was able to put some additional tests in place to ensure compat (which actually is a big deal as 0.13 changes a lot internally), including some hoops that needed jumping for Series (which is not not a subclass of ndarray)

in any event will be merging soon (prob tomorrow) so keep a look out, @jtratner is fixing a couple of more things

also pls try out the csv features I mentioned above if u can

@jreback
Copy link
Contributor

jreback commented Sep 10, 2013

@d10genes all merged in, master should work for you now...thanks!

@wcbeard
Copy link
Contributor Author

wcbeard commented Sep 10, 2013

Awesome, that fixed it. Thanks a lot, I really appreciate it.

And regarding the csv features, I did try them out and they seemed to work, but I don't think it's possible with CSV to automatically encode which columns are indices (looks like pandas needs you to use the column names as read_csv parameters).

Pickling seems to take care of it all though. Thanks!

@jreback
Copy link
Contributor

jreback commented Sep 10, 2013

@d10genes great....yep...have to specify column names, unfortunately csv is not a roundtripable format (w/o some user parameters).
you might also try HDFStore, supports multi-indexes and is roundtripable as the meta data is stored

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Internals Related to non-user accessible pandas implementation
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants