Skip to content

BUG: Unpacking PY2 msgpack in PY3 #12142

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
kawochen opened this issue Jan 26, 2016 · 4 comments
Closed

BUG: Unpacking PY2 msgpack in PY3 #12142

kawochen opened this issue Jan 26, 2016 · 4 comments
Labels
Unicode Unicode strings
Milestone

Comments

@kawochen
Copy link
Contributor

In #10686, I should have made all the strings in encode Unicode strings. Now 'abc' packed in P2 becomes (or rather remains as) b'abc' when unpacked in P3. This I think is the desired behavior (bytes remain as bytes and text remains as text), but it causes errors in decode, because, for example, 'typ' (==u'type' in P2) is expected while b'typ' (=='typ' in P2) is the key.

Reading in the other direction is fine because P2 is more tolerant of these things.

To reproduce this,

(P2) python generate_legacy_storage_files.py your_dir msgpack
(P3) pandas.read_msgpack(the_file_just_created)
@jreback jreback added Unicode Unicode strings Msgpack labels Jan 26, 2016
@jreback jreback added this to the Next Major Release milestone Jan 26, 2016
@jreback
Copy link
Contributor

jreback commented Jan 26, 2016

ok, I don't see a versioning schema in the actual packed file? Is it possible to add one? so we can then conditionally do things.

@kawochen
Copy link
Contributor Author

I think the cleanest solution would be to sprinkle encode and decode with u, That wouldn't save the files packed with 0.17 in P2 (they still wouldn't be able to be unpacked in PY3). I think the end result would be the same as adding some version info.

packed unpacked
pre-0.17 PY2 any
pre-0.17 PY3 any
0.17 PY2 PY2
0.17 PY3 any

The other option would be to test for bytes vs strings in decode (which is called for all dicts), in which case files packed with 0.17 in PY2 can be unpacked by future pandas in PY3. This would be a bit unwieldy as you would need to test for both string 'abc' and bytes b'abc' pretty much everywhere.

packed unpacked
pre-0.17 PY2 any
pre-0.17 PY3 any
0.17 PY2 PY2 (<=0.17), any (>=0.18.?)
0.17 PY3 any

@kawochen
Copy link
Contributor Author

kawochen commented Feb 1, 2016

What do people think? Would like to fix in 0.18.0.

@jreback jreback modified the milestones: 0.18.0, Next Major Release Feb 12, 2016
@jreback
Copy link
Contributor

jreback commented Feb 17, 2016

closed by #12129

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Unicode Unicode strings
Projects
None yet
Development

No branches or pull requests

2 participants