-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: compat_pickle should not modify global namespace #5661
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Hi, I updated my question on SO (http://stackoverflow.com/questions/20444593/pandas-compiled-from-source-default-pickle-behavior-changed). I applied the patch, although things are not working yet. Thanks for your support. |
pls provide a link to your file. 100mb is no big deal. did u pickle this is 0.12? give me an example of what u r doing |
Hi
this is a link to the file https://www.dropbox.com/s/6g4ej7ru5244e35/pickle_L1cor_fd.pic I am not quite sure this job alone can do the job. What is pickled is a list of dicts, One or more dicts entries are DataFram; then there is more. I can try to extract some minimal working thing from my code in case it is necessary. As I wrote on SO, after changing the code I have this issue
Let me know. |
BUG: compat_pickle should not modify global namespace
I updated the SO question. You can just
|
couldn't/shouldn't the unpickler handle both versions internally to keep external compatibility? |
it does (it tries the unmodified version first then with modifications if that fails) |
but there's still problems when using just pickle.loads(data) with a 0.12 pickle:
we shouldn't have to now go to a file (or file-like) object and use read_pickle. |
no way around that unless I modify the global namespace; just use |
see here as well: (and its in 0.13 whatsnew) http://pandas.pydata.org/pandas-docs/dev/io.html#io-pickle |
Oh, I saw it, just thought there should be a way to avoid it. |
To put it another way, how should one now read a pickle contained in a string? A common case for that is a pickle stored in a BLOB in a database, does one really have to first save the data in a file to be able to use pd.read_pickle? |
pickle didn't before read from a file previously, you can just wrap it in a e.g.
|
which is what pickle.loads does internally - I'm more concerned about having to change our widespread existing pickle.loads() calls to pd.read_pickle, thinking that there would be a call to the pandas Unpickler somewhere in this process but there doesn't appear to be a way out without touching global namespace. |
you could patch it internally if you want, esentially reverse this fix (see |
or perhaps a pandas option to turn it back on? what were the negative consequences of having it that way? |
we could add an option (would have to happen in 0.13.1 though). Their was a 'bug' because of how I was calling it. More generally its 'bad practice' to monkey patch another module at run-time. will create an issue for an option and for pickle read from a string (that is sort of a separate issue though) |
Having that done in read_pickle is not much benefit, the real issue for me is pickle.load/s breaking for old pickles. I tried your suggestion of commenting out the copy.copy and it had no impact:
I don't understand all the mechanics but saw your discussion with Wes on http://grokbase.com/t/python/pandas-dev/134n2f0xw2/pickle-is-evil that these are recurring issues with pickle stability and limitations in the way python does it. As it's numpy complaining, is there no way to improve the situation from there? Otherwise we will probably be forced to update our pickled objects when this pops up. |
yep....saving pickles is not a particularly efficient / good idea, but I know people do it. I always use HDF. why can't you use |
well, that's what I'm trying to do at the moment:
where s is the pickled object string. |
...perhaps we can have a read_pickle equivalent that takes a file-like object? Or is this what you intended in Issue #5924 ? |
yes....that is what #5924 is for....its pretty easy to do though
|
well, in case this helps anyone else, we'll end up monkey patching cPickle.loads instead in sitecustomize.py:
in combination with raising a warning to at least to buy us time to replace our 0.12 pickles |
ok...looks reasonable |
@ruidc not sure if you saw this new feature in 0.13: http://pandas.pydata.org/pandas-docs/dev/io.html#io-msgpack. Pretty competetive with pickle and easier on compat. |
Thanks. I think we'll wait for the stability this time around:) . |
@ruidc haha! though that's the selling point of msgpack...that its backwards compatible (e.g. it doesn't care about sub-classing for example)... |
sounds great to me, will just wait until it's no longer experimental in pandas - can you tell i've been bitten from sailing too close to the wind lately? it's good to see pandas making great progress and using best-in-breed solutions though. From: jreback [email protected] @ruidc haha! though that's the selling point of msgpack...that its backwards compatible (e.g. it doesn't care about sub-classing for example)... |
turns out was modifying the python pickle just by importing pandas
when sub classing have to copy a mutable property before modifying
http://stackoverflow.com/questions/20444593/pandas-compiled-from-source-default-pickle-behavior-changed