-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
savez method for DataFrame, Series: porting data between python2 and python3 #3151
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
pandas uses 2to3, and tests are cross-python by using
That shouldn't be a problem. I don't think pickle across pythons has been raised as an issue before, so thanks for that. ( edit: #686 ) |
have u considered http://pandas.pydata.org/pandas-docs/dev/io.html#hdf5-pytables offers all of the savez type functionality, faster, has compressing options, and offers tables (optional) for another option only downside is a couple of additional dependencies |
See this PyTables issue, provides a savez/PyTables comparsion: Here is HDFStore export capability to R table format: Here (see 6), is something that could be useful: #2391 I could see adding an |
Is pytables necessary for running pandas? I thought it was optional. |
Also, npz and npy are not new serializations. They are part of numpy which pandas is built upon. Serializing the object and serialization the data are two fundamentally different things. |
pytables is optional, but highly recommended, esp when dealing with data of any non-trivial size you can simply do this I believe what I think you are talking about is supporting this method officially. I have no problem with it, but its essentially deprecated as its a numpy only format. just because something is in numpy does not mean pandas should support it, after all, just my 2c |
Reminder that we need to implement a pickle-agnostic binary data format using msgpack or some such that is not dependent on pickle (and preferable not dependent on too many internal details of pandas objects). |
Had a go at getting PyTables with python3 ... still a lot of work I think. It looks like PyTables depends on numexpr which is not yet py3k'd. I've moderate success hacking away at these kinds of conversions but I don't really know what I'm doing which makes me less than an ideal contributor. HDFStore looks like a great option. But it would be much better if it was a required dependency of pandas. On the other hand this would make pandas harder to install. |
looks like both Numexpr and pytables are going to be py3 very soon in any event (the branches are merged ) the dependency doesn't matter, the user can install if they want. in fact for 0.11 we made Numexpr a highly recommended dependency in order to use internally (but all that means is doc warnings!) another really good option if is read_csv/to_csv they are quite fast |
fwiw both Numexpr and pytables are maintained by same team, so should be released together |
Sounds great! I'll try to find the dev branches and try it out ... |
PyTables 3.0.0 and ne 2.1 solve this problem to a large extent |
I find that I am sometimes working between python2 and 3 installs and using pickle for passing data is problematic. I am having a look at adding some simple functions like:
pandas.loadnpz
pandas.obj.savenpz (where obj would be DataFrame, Series, Panel etc ...)
Any opinions on this?
Is it already there and I haven't found it?
Is there a natural (efficient) place to do this? It seems the save/load are fairly generic and are attached to all PandasObjects. Maybe at the level of NDFrame would make most sense?
Also, supposing there is interest (or at least not objection) to this, is there any way to add a test for this under the current framework since the full functionality would involve the space of both python2.* and python3.* pandas.
The text was updated successfully, but these errors were encountered: