Skip to content

compat pickles don't have '_id' #8431

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
dalejung opened this issue Oct 1, 2014 · 7 comments · Fixed by #8454
Closed

compat pickles don't have '_id' #8431

dalejung opened this issue Oct 1, 2014 · 7 comments · Fixed by #8454
Labels
Compat pandas objects compatability with Numpy or Python functions
Milestone

Comments

@dalejung
Copy link
Contributor

dalejung commented Oct 1, 2014

So I'm running into an issue with creating a Panel out of legacy DataFrame pickles. The issue is that _id is set to None and the current Index.is_ logic treats any two Indexes with unset _id as the same.

I thought the issue was with the unpickling. But it looks like there was an attempt to address this

def is_(self, other):
    return self._id is getattr(other, '_id', Ellipsis)

I'm guessing whoever thought that getattr(other, '_id', Ellipsis) would return Ellipsis if the _id was not set.

So I'm seeing:

  1. Make unpickling make a new _id if it's not found in state
  2. Make any Index.is_ always return False if either one has an unset id.

Maybe both?

@jreback
Copy link
Contributor

jreback commented Oct 1, 2014

you will have 2 provide a concrete example here of where this breaks and sample pickles

@jreback
Copy link
Contributor

jreback commented Oct 2, 2014

@dalejung ???? releasing 0.15.0 rc soon, If this is a problem would like to address before then.

can you should some code to reproduce?

@dalejung
Copy link
Contributor Author

dalejung commented Oct 3, 2014

A self contained example is here:
http://nbviewer.ipython.org/gist/dalejung/e6d7fd95e2bef2cda81b
I attached the pickle files even though you can generate them yourself via dump.py.

The important part is pickling a legacy pandas object, the pickle api doesn't matter as _id isn't explicitly pickled either way and just comes along for the ride.

Man, it'd be nice if github could show ipython notebooks inline.

@jreback
Copy link
Contributor

jreback commented Oct 3, 2014

@dalejung your 0.12 unpickle just fine (for me)
the 0.14.1. have some weird error (unpickle protocol 4), so not sure what you are doing
I can unpickle 0.14.1 just fine.

So still unclear what actually doesn't work

@jreback
Copy link
Contributor

jreback commented Oct 3, 2014

these all read for me in py3.4 with the lastest version of master.

so if you have more info pls post.

@jreback jreback closed this as completed Oct 3, 2014
@jreback jreback added Compat pandas objects compatability with Numpy or Python functions Can't Repro labels Oct 3, 2014
@dalejung
Copy link
Contributor Author

dalejung commented Oct 3, 2014

@jreback not sure we're on the same page. The issue isn't reading the pickle. The 0.12.0 pickles don't have the _id on their Indexes because that did not exist back then.

This is an issue because if

assert s1.index._id is None
assert s2.index._id is None

# then
s1.index.is_(s2.index) is True

So for something like this:

# indexes are not identical
assert s1.index.size != s2.index.size
df = pd.DataFrame({'s1' : s1, 's2' : s2}) # erros

/python/externals/pandas/pandas/core/internals.py in _stack_arrays(tuples, dtype)
   3697     stacked = np.empty(shape, dtype=dtype)
   3698     for i, arr in enumerate(arrays):
-> 3699         stacked[i] = _asarray_compat(arr)
   3700 
   3701     return stacked, placement

ValueError: could not broadcast input array from shape (27) into shape (30)

You get a shape mismatch because pandas doesn't union the two indexes because it already thinks they are equal. Normally it would've reindexed the data to the same shape.

@jreback
Copy link
Contributor

jreback commented Oct 3, 2014

@dalejung ok, easy enough. The identity was not being reset on legacy pickles. It didn't matter for current pickles because they go thru __new__ which already resets.

thanks for the report

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Compat pandas objects compatability with Numpy or Python functions
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants