-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Add JSON export option for DataFrame #631
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I actually need to do this for a current project I'm working on. I'll get started on tackling this if it is open issue. I will probably be using the gviz api as reference (http://code.google.com/p/google-visualization-python/). |
By all means go right ahead. @mikedewar may also be interested for his project https://github.com/mikedewar/D3py |
Would be happy to see this exist! In fact I made a gist a while ago to do it: https://gist.github.com/1486027 Please feel free to use as a starting point! Probably could do with a bit more consideration in terms of multiple levels of keys and other stuff about data frames that I don't know about yet. |
Now if we want to be truly hardcore (and why wouldn't we be?) we should fork UltraJSON and make it DataFrame-specific to get the best performance |
An interesting idea. I'll have to examine the code, although my experience with C is somewhat limited. I may need to do some serious review, but it will be excellent practice nonetheless. Also, I had a look at the google-visualization-python api and I like the use of a "table description" that you can pass it to define the desired structure of the json string. This provides a great deal of flexibility that would be really useful, and would make using the string in something like google charts really easy. |
Hi all, I've done some preliminary work in this direction. In my fork of usjon I've added some basic support for numpy. Right now it just handles some of the basic numpy scalars and 1D arrays. The implementation isn't perfect (I'm a bit concerned with casting everything) but it seems to work ok. The goal is to eventually add support for numpy N-dimensional arrays (possibly with a max limit on N) and pandas data types, specifically Series and DataFrame. It's my first time dealing with the Python and Numpy C-APIs so any comments are welcome! |
Encoding support for DataFrame, Series and Index is now committed, as well as proper support for encoding numpy arrays. Still not sure how to properly handle decoding, right now I'm just passing the decoded dict / list to the relevant data-type's constructor. I decided to encode the DataFrame index and column labels separately (it suits my purposes and I think it's more efficient to work on the underlying numpy arrays). So you end up with something like: >>> df = DataFrame([[1,2,3], [4,5,6]], columns=['x', 'y', 'z'], index=['a', 'b'])
>>> ujson.encode(df)
'{"columns":["x","y","z"],"index":["a","b"],"data":[[1,2,3],[4,5,6]]}' |
I think what's needed for @mikedewar's needs and others would be:
when you deserialize that and pass it to DataFrame, you get back the same DataFrame:
However, this doesn't give you the row index, but that's not a big deal for the particular use case (feeding a DataFrame into d3 or something else) |
Ok, I was initially going to match the output of the >>> DataFrame(**ujson.loads('{"columns":["x","y","z"],"index":["a","b"],"data":[[1,2,3],[4,5,6]]}'))
x y z
a 1 2 3
b 4 5 6 That said I don't think it would be too difficult to add an option to produce output like you mentioned. How about a >>> df = DataFrame([[1,2,3], [4,5,6]], columns=['x', 'y', 'z'], index=['a', 'b'])
>>> ujson.dumps(df, labelled=True)
'{"x":{"a":1,"b":4},"y":{"a":2,"b":5},"z":{"a":3,"b":6}}' Or is it absolutely necessary to suppress the index labels? |
I'm thinking it might be preferable to ship the relevant ultrajson code in pandas and use it to implement |
Agreed, it would make sense for it to be included in pandas. I think all the ujson code is required (as it will still have to deal with basic types), albeit tailored for numpy and pandas types. I can fork and attempt to introduce it into pandas if you point me in the right direction. Ujson is composed of several different c files, I'm not sure where to put them and how to include them in the build process. |
You would want to put it in a subdirectory of pandas/src and co-opt the extension configuration from the UltraJSON |
I've finally got around to revisiting this. I've added support to my fork of ujson for different output formats when encoding pandas data types: In [4]: df = DataFrame([[1,2,3], [4,5,6]], index=['a', 'b'], columns=['x', 'y', 'z'])
In [5]: ujson.encode(df, format="headers")
Out[5]: '{"columns":["x","y","z"],"index":["a","b"],"data":[[1,2,3],[4,5,6]]}'
In [6]: ujson.encode(df, format="records")
Out[6]: '[{"x":1,"y":2,"z":3},{"x":4,"y":5,"z":6}]'
In [7]: ujson.encode(df, format="indexed")
Out[7]: '{"a":{"x":1,"y":2,"z":3},"b":{"x":4,"y":5,"z":6}}'
In [8]: ujson.encode(df, format="column_indexed")
Out[8]: '{"x":{"a":1,"b":4},"y":{"a":2,"b":5},"z":{"a":3,"b":6}}' If I've added similar support for Series and Index (although some of the formats don't suit them it tries to handle them sensibly) In [9]: s = Series([10, 20, 30, 40, 50, 60], name="myseries", index=[6,7,8,9,10,15])
In [10]: ujson.encode(s, format="headers")
Out[10]: '{"name":"myseries","index":[6,7,8,9,10,15],"data":[10,20,30,40,50,60]}'
In [11]: ujson.encode(s, format="records")
Out[11]: '[10,20,30,40,50,60]'
In [12]: ujson.encode(s, format="indexed")
Out[12]: '{"6":10,"7":20,"8":30,"9":40,"10":50,"15":60}'
In [13]: ujson.encode(s, format="column_indexed")
Out[13]: '{"6":10,"7":20,"8":30,"9":40,"10":50,"15":60}'
In [14]: i = Index([23, 45, 18, 98, 43, 11], name="myindex")
In [15]: ujson.encode(i, format="headers")
Out[15]: '{"name":"myindex","data":[23,45,18,98,43,11]}'
In [16]: ujson.encode(i, format="records")
Out[16]: '[23,45,18,98,43,11]'
In [17]: ujson.encode(i, format="indexed")
Out[17]: '[23,45,18,98,43,11]'
In [18]: ujson.encode(i, format="column_indexed")
Out[18]: '[23,45,18,98,43,11]' My next step is to integrate this into pandas but I'd welcome any comments. Are there values for the |
Hm, I'll think about the API. What you propose looks pretty good and you could just go for that for now, adding a |
ujson is pure C, no python file except for setup.py and some test classes. I think all of it is required though (apart from its test code and metafiles) so it can properly handle whatever type happens to be in the DataFrame etc. |
Right, so you would just need to set it up to build as a submodule inside pandas and wire it up with the new object instance methods, and write appropriate tests. If you do some of the heavy lifting to set this up and make a pull request I can integrate and round things out in a few weeks |
Hi Wes, I've improved the performance a bit and made some other tweaks and improvements, most notably I've added support for direct decoding to numpy arrays which gets rid of the list to numpy array conversion step. I've updated the README on my fork with more information and some simple benchmarks, https://github.com/Komnomnomnom/ultrajson. Although there were a couple of surprises I'm pretty happy with the overall performance. Integrating with pandas and the pandas build was a lot more straightforward than I expected. I should send through a pull request later on today (I'll attach it to this issue if I can). Oh and I've changed the format argument to 'orient', seems to fit better with other DataFrame methods and format clashes with a Python built-in. I also added the 'values' format which only encodes the DataFrame values array, ignoring column and index labels. |
All issues related to DataFrame.to_json() seems closed, but on version 0.8.1 there is not DataFrame.to_json() method. |
It's not part of pandas for now due to issues with MinGW. It's in a
|
MinGW issues are Windows related, I suppose. |
it's pydata/pandasjson |
* Update numpy_arrays.py * test for fix * update changelog * Remove circle build status since arctic still doesnt support circle 2.0
No description provided.
The text was updated successfully, but these errors were encountered: