-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
DataFrame with 'list of dicts' behaviour proposal #526
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Let me guess...you've got lists of JSON objects? =P this works for example:
but I agree with you that the constructor should be able to figure out a list of dicts without having to type so much. I'll look at your impl and cook up something similar / fast as possible. |
My data does mostly come in from JSON, and I have to transform it. I Eventually, I want to make my exploratory stuff as simple as possible, Note: If the default 'use all columns that appear in any' is desired
It's worth thinking about if this is something you want to actually Thanks for reviewing the idea! (and sorry that the GL On Thu, Dec 22, 2011 at 2:07 PM, Wes McKinney
|
I think 'use all columns that appear in any' is the right default behavior unless a set of columns is explicitly passed (in which case obviously just use those). This would probably also be a good time to review all the dict-creation routines and set up some vbench action for them too (http://pandas.sourceforge.net/vbench.html). I'm kind of performance obsessed (!) if that hasn't come through yet, so I suspect I can come up with a pretty performant way of processing the data into the right form. As far as giving privilege to the first element of a list...well, if a user passes a list of differently-typed objects, that is most likely going to blow up. In practice that is pretty rare so I'm willing to live with it. |
Let me know if you want design or code review on any of it! I will be (eventually, I want to write bridge code to use DataFrames in orange as well) GL On Thu, Dec 22, 2011 at 2:32 PM, Wes McKinney
|
Cool. I think that would be very valuable (on both fronts). I'd be happy to have json-related tools in pandas, I'm eventually going to need to write up DataFrame with JS data visualization in the browser |
I implemented this in the above commit. I guess you piqued my interest :) btw the implementation (utilizing Cython routines) above is roughly 6x faster than the one in the gist above. The Cython routine I have that implements
beats it by about 35%. Though I do love the simple elegance of itertools and generators |
Sketch of proposed behaviour... make 'list of dicts' create a (potentially) 'ragged' array, with autoguessed column names, and sensible default values, when the keys don't exist in all dicts.
Current behaviour:
In [215]: pandas.DataFrame([dict(a=1),dict(a=2)],columns=['a'])
Out[215]:
a
0 {'a': 1}
1 {'a': 2}
(I happen to find this very surprising/useless behaviour!)
(one) Proposed behaviour...
I have a straw implementation at: https://gist.github.com/1511578
(there is a lot to comment on!... should it use the set of keys? Do we need more args? Documentation? Is this just a recipe?)
The text was updated successfully, but these errors were encountered: