BUG: compat_pickle should not modify global namespace #5661

jreback · 2013-12-07T19:17:52Z

turns out was modifying the python pickle just by importing pandas

when sub classing have to copy a mutable property before modifying

http://stackoverflow.com/questions/20444593/pandas-compiled-from-source-default-pickle-behavior-changed

acorbe · 2013-12-08T01:13:02Z

Hi,

I updated my question on SO (http://stackoverflow.com/questions/20444593/pandas-compiled-from-source-default-pickle-behavior-changed). I applied the patch, although things are not working yet.

Thanks for your support.

jreback · 2013-12-08T01:18:30Z

pls provide a link to your file. 100mb is no big deal. did u pickle this is 0.12? give me an example of what u r doing

acorbe · 2013-12-08T10:33:55Z

Hi
pandas version (from canopy package manager)

Size: 7.32 MB
Version: 0.12.0
Build: 2
Dependencies:
 numpy 1.7.1
 python_dateutil
 pytz 2011n

  md5: 7dd4385bed058e6ac15b0841b312ae35

this is a link to the file https://www.dropbox.com/s/6g4ej7ru5244e35/pickle_L1cor_fd.pic

I am not quite sure this job alone can do the job.

What is pickled is a list of dicts, One or more dicts entries are DataFram; then there is more.

I can try to extract some minimal working thing from my code in case it is necessary.

As I wrote on SO, after changing the code I have this issue

In [4]: pickle.load(open('pickle_L1cor_s1.pic','rb'))
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-4-88719f8f9506> in <module>()
----> 1 pickle.load(open('pickle_L1cor_s1.pic','rb'))

/home/acorbe/Canopy/appdata/canopy-1.1.0.1371.rh5-x86_64/lib/python2.7/pickle.pyc in load(file)
   1376
   1377 def load(file):
-> 1378     return Unpickler(file).load()
   1379
   1380 def loads(str):

/home/acorbe/Canopy/appdata/canopy-1.1.0.1371.rh5-x86_64/lib/python2.7/pickle.pyc in load(self)
    856             while 1:
    857                 key = read(1)
--> 858                 dispatch[key](self)
    859         except _Stop, stopinst:
    860             return stopinst.value

/home/acorbe/Canopy/appdata/canopy-1.1.0.1371.rh5-x86_64/lib/python2.7/pickle.pyc in             load_reduce(self)
   1131         args = stack.pop()
   1132         func = stack[-1]
-> 1133         value = func(*args)
   1134         stack[-1] = value
   1135     dispatch[REDUCE] = load_reduce

TypeError: _reconstruct: First argument must be a sub-type of ndarray

Let me know.
Thanks

BUG: compat_pickle should not modify global namespace

jreback · 2013-12-08T15:26:38Z

I updated the SO question. You can just

pd.read_pickle(file_name) and it will work.

ruidc · 2014-01-13T16:00:06Z

couldn't/shouldn't the unpickler handle both versions internally to keep external compatibility?

jreback · 2014-01-13T16:16:17Z

it does (it tries the unmodified version first then with modifications if that fails)

ruidc · 2014-01-13T16:44:06Z

but there's still problems when using just pickle.loads(data) with a 0.12 pickle:

TypeError: ('_reconstruct: First argument must be a sub-type of ndarray', <built-in function _reconstruct>, (<class 'pandas.core.series.Series'>, (0,), 'b'))

we shouldn't have to now go to a file (or file-like) object and use read_pickle.

jreback · 2014-01-13T16:45:22Z

no way around that unless I modify the global namespace; just use pd.read_pickle instead; see the linked issue

jreback · 2014-01-13T16:46:12Z

see here as well: (and its in 0.13 whatsnew) http://pandas.pydata.org/pandas-docs/dev/io.html#io-pickle

ruidc · 2014-01-13T16:48:40Z

Oh, I saw it, just thought there should be a way to avoid it.

sadruddin · 2014-01-13T16:59:20Z

To put it another way, how should one now read a pickle contained in a string? A common case for that is a pickle stored in a BLOB in a database, does one really have to first save the data in a file to be able to use pd.read_pickle?

jreback · 2014-01-13T17:03:34Z

pickle didn't before read from a file previously, you can just wrap it in a StringIO. You could make an issue for this if you want (to enable that enhancement).

e.g.

pd.read_pickle(StringIO(data))

ruidc · 2014-01-13T17:23:19Z

which is what pickle.loads does internally - I'm more concerned about having to change our widespread existing pickle.loads() calls to pd.read_pickle, thinking that there would be a call to the pandas Unpickler somewhere in this process but there doesn't appear to be a way out without touching global namespace.

jreback · 2014-01-13T17:28:41Z

you could patch it internally if you want, esentially reverse this fix (see compat/pickle_compat.py)
Its basically a 1-line change (just remove the copy.copy).

ruidc · 2014-01-13T18:05:15Z

or perhaps a pandas option to turn it back on? what were the negative consequences of having it that way?

jreback · 2014-01-13T18:08:37Z

we could add an option (would have to happen in 0.13.1 though). Their was a 'bug' because of how I was calling it.

More generally its 'bad practice' to monkey patch another module at run-time.

will create an issue for an option and for pickle read from a string (that is sort of a separate issue though)

ruidc · 2014-01-14T13:24:12Z

Having that done in read_pickle is not much benefit, the real issue for me is pickle.load/s breaking for old pickles.

I tried your suggestion of commenting out the copy.copy and it had no impact:

  File "C:\Python27\lib\pickle.py", line 1382, in loads
    return Unpickler(file).load()
  File "C:\Python27\lib\pickle.py", line 858, in load
    dispatch[key](self)
  File "C:\Dev\src\pandas\pandas\compat\pickle_compat.py", line 29, in load_reduce
    value = func(*args)
TypeError: _reconstruct: First argument must be a sub-type of ndarray

I don't understand all the mechanics but saw your discussion with Wes on http://grokbase.com/t/python/pandas-dev/134n2f0xw2/pickle-is-evil that these are recurring issues with pickle stability and limitations in the way python does it. As it's numpy complaining, is there no way to improve the situation from there?

Otherwise we will probably be forced to update our pickled objects when this pops up.

jreback · 2014-01-14T13:30:00Z

yep....saving pickles is not a particularly efficient / good idea, but I know people do it. I always use HDF.

why can't you use read_pickle? this will read old pickles, if not pls post a reproducible example

ruidc · 2014-01-14T13:50:02Z

well, that's what I'm trying to do at the moment:

  File "C:\Dev\intranet\rdc_test\test_unpickle.py", line 13, in corestone_loads
    return pandas.read_pickle(StringIO.StringIO(s))
  File "C:\Dev\src\pandas\pandas\io\pickle.py", line 49, in read_pickle
    return try_read(path)
  File "C:\Dev\src\pandas\pandas\io\pickle.py", line 45, in try_read
    with open(path, 'rb') as fh:
TypeError: coercing to Unicode: need string or buffer, instance found

where s is the pickled object string.
To elaborate, we have pickles which may include pandas objects, and use cPickle.loads to save and load them into database, so am attempting to replace our calls to cPickle.loads with a wrapper to test for this and call read_pickle.

ruidc · 2014-01-14T13:58:09Z

...perhaps we can have a read_pickle equivalent that takes a file-like object? Or is this what you intended in Issue #5924 ?

jreback · 2014-01-14T14:01:34Z

yes....that is what #5924 is for....its pretty easy to do though

read_pickle(StringIO.StringIO(string))

ruidc · 2014-01-14T14:31:17Z

well, in case this helps anyone else, we'll end up monkey patching cPickle.loads instead in sitecustomize.py:

import cStringIO
import pandas
print pandas.__version__
from pandas.compat import pickle_compat
import cPickle

original_loads = cPickle.loads

def our_loads(s):
    try:
        return original_loads(s)
    except TypeError as e:
        if "_reconstruct: First argument must be a sub-type of ndarray" in str(e):
            return pickle_compat.load(cStringIO.StringIO(s), compat=True)
        else:
            raise

cPickle.loads = our_loads

in combination with raising a warning to at least to buy us time to replace our 0.12 pickles

jreback · 2014-01-14T14:44:33Z

ok...looks reasonable

jreback · 2014-01-15T16:48:03Z

@ruidc not sure if you saw this new feature in 0.13: http://pandas.pydata.org/pandas-docs/dev/io.html#io-msgpack. Pretty competetive with pickle and easier on compat.

ruidc · 2014-01-15T18:31:02Z

Thanks. I think we'll wait for the stability this time around:)

.

jreback · 2014-01-15T18:32:31Z

@ruidc haha! though that's the selling point of msgpack...that its backwards compatible (e.g. it doesn't care about sub-classing for example)...

ruidc · 2014-01-15T19:31:20Z

sounds great to me, will just wait until it's no longer experimental in pandas - can you tell i've been bitten from sailing too close to the wind lately? it's good to see pandas making great progress and using best-in-breed solutions though.

From: jreback [email protected]
To: pydata/pandas [email protected]
Cc: ruidc [email protected]
Sent: Wednesday, 15 January 2014, 19:32
Subject: Re: [pandas] BUG: compat_pickle should not modify global namespace (#5661)

@ruidc haha! though that's the selling point of msgpack...that its backwards compatible (e.g. it doesn't care about sub-classing for example)...
—
Reply to this email directly or view it on GitHub.

BUG: compat_pickle should not modify global namespace

fc05ab9

jreback added a commit that referenced this pull request Dec 8, 2013

Merge pull request #5661 from jreback/pickle_fix

98e48ca

BUG: compat_pickle should not modify global namespace

jreback merged commit 98e48ca into pandas-dev:master Dec 8, 2013

This was referenced Jan 13, 2014

API: create option to allow pandas to patch pickle.load/loads at runtime for compat #5923

Closed

API: allow read_pickle to read from strings (and not just files) #5924

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: compat_pickle should not modify global namespace #5661

BUG: compat_pickle should not modify global namespace #5661

jreback commented Dec 7, 2013

acorbe commented Dec 8, 2013

jreback commented Dec 8, 2013

acorbe commented Dec 8, 2013

jreback commented Dec 8, 2013

ruidc commented Jan 13, 2014

jreback commented Jan 13, 2014

ruidc commented Jan 13, 2014

jreback commented Jan 13, 2014

jreback commented Jan 13, 2014

ruidc commented Jan 13, 2014

sadruddin commented Jan 13, 2014

jreback commented Jan 13, 2014

ruidc commented Jan 13, 2014

jreback commented Jan 13, 2014

ruidc commented Jan 13, 2014

jreback commented Jan 13, 2014

ruidc commented Jan 14, 2014

jreback commented Jan 14, 2014

ruidc commented Jan 14, 2014

ruidc commented Jan 14, 2014

jreback commented Jan 14, 2014

ruidc commented Jan 14, 2014

jreback commented Jan 14, 2014

jreback commented Jan 15, 2014

ruidc commented Jan 15, 2014

jreback commented Jan 15, 2014

ruidc commented Jan 15, 2014

BUG: compat_pickle should not modify global namespace #5661

BUG: compat_pickle should not modify global namespace #5661

Conversation

jreback commented Dec 7, 2013

acorbe commented Dec 8, 2013

jreback commented Dec 8, 2013

acorbe commented Dec 8, 2013

jreback commented Dec 8, 2013

ruidc commented Jan 13, 2014

jreback commented Jan 13, 2014

ruidc commented Jan 13, 2014

jreback commented Jan 13, 2014

jreback commented Jan 13, 2014

ruidc commented Jan 13, 2014

sadruddin commented Jan 13, 2014

jreback commented Jan 13, 2014

ruidc commented Jan 13, 2014

jreback commented Jan 13, 2014

ruidc commented Jan 13, 2014

jreback commented Jan 13, 2014

ruidc commented Jan 14, 2014

jreback commented Jan 14, 2014

ruidc commented Jan 14, 2014

ruidc commented Jan 14, 2014

jreback commented Jan 14, 2014

ruidc commented Jan 14, 2014

jreback commented Jan 14, 2014

jreback commented Jan 15, 2014

ruidc commented Jan 15, 2014

jreback commented Jan 15, 2014

ruidc commented Jan 15, 2014