Skip to content

Support creating a DataFrame from a list of dictionaries providing an index column #1744

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
blais opened this issue Aug 8, 2012 · 2 comments
Milestone

Comments

@blais
Copy link
Contributor

blais commented Aug 8, 2012

Hey Wes,

Here's a use case that I think is not covered by Pandas.
It's a use case for creation of a DataFrame object from a
list of dicts.

I extract "documents" (dicts) from a MongoDB database.
From these dicts, one of the keys is meant to be used as
the index.

While I can do something like

df = DataFrame(documents, columns=['order_id', 'time', 'quantity'])

the index that is added is an arbitrary one.
I'd like to use 'order_id' as my index, in this example.
The only way for me to do that at the moment is to create
the index explicitly, e.g. like this:

df = DataFrame(documents, columns=['order_id', 'time', 'quantity'],
index=[o['order_id'] for o in documents])

This is a bit of a PIA. It would be nice if one could just
specify the index key/column name, e.g.

df = DataFrame(documents, columns=['order_id', 'time', 'quantity'], indexcol='order_id')

That's my use case almost everywhere.

@lodagro
Copy link
Contributor

lodagro commented Aug 8, 2012

You can use a different constructor DataFrame.from_records

In [36]: def create_dict(order_id):
   ....:      return {'order_id': order_id, 'quantity': np.random.randint(1, 10), 'price': np.random.randint(1, 10)}

In [37]: documents = [create_dict(i) for i in range(10)]

In [38]: documents.append({'order_id': 10, 'quantity': 5})   # demo missing data

In [39]: df = pandas.DataFrame.from_records(documents).set_index('order_id')

In [40]: df
Out[40]:
          price  quantity
order_id
0             2         2
1             2         3
2             1         8
3             2         1
4             3         7
5             9         5
6             7         4
7             7         5
8             7         8
9             6         3
10          NaN         5

In [41]: df = pandas.DataFrame.from_records(documents, columns=['order_id', 'quantity', 'price'], index='order_id')

In [42]: df
Out[42]:
    quantity  price
0          2      2
1          3      2
2          8      1
3          1      2
4          7      3
5          5      9
6          4      7
7          5      7
8          8      7
9          3      6
10         5    NaN

Last one seems not to set the index name.

@wesm wesm closed this as completed in dfe7a55 Sep 18, 2012
@wesm
Copy link
Member

wesm commented Sep 18, 2012

I fixed DataFrame.from_records so that it sets the index name or names. @blais you should use DataFrame.from_records for this purpose

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants