Support creating a DataFrame from a list of dictionaries providing an index column #1744

blais · 2012-08-08T10:30:03Z

Hey Wes,

Here's a use case that I think is not covered by Pandas.
It's a use case for creation of a DataFrame object from a
list of dicts.

I extract "documents" (dicts) from a MongoDB database.
From these dicts, one of the keys is meant to be used as
the index.

While I can do something like

df = DataFrame(documents, columns=['order_id', 'time', 'quantity'])

the index that is added is an arbitrary one.
I'd like to use 'order_id' as my index, in this example.
The only way for me to do that at the moment is to create
the index explicitly, e.g. like this:

df = DataFrame(documents, columns=['order_id', 'time', 'quantity'],
index=[o['order_id'] for o in documents])

This is a bit of a PIA. It would be nice if one could just
specify the index key/column name, e.g.

df = DataFrame(documents, columns=['order_id', 'time', 'quantity'], indexcol='order_id')

That's my use case almost everywhere.

lodagro · 2012-08-08T11:54:08Z

You can use a different constructor DataFrame.from_records

In [36]: def create_dict(order_id):
   ....:      return {'order_id': order_id, 'quantity': np.random.randint(1, 10), 'price': np.random.randint(1, 10)}

In [37]: documents = [create_dict(i) for i in range(10)]

In [38]: documents.append({'order_id': 10, 'quantity': 5})   # demo missing data

In [39]: df = pandas.DataFrame.from_records(documents).set_index('order_id')

In [40]: df
Out[40]:
          price  quantity
order_id
0             2         2
1             2         3
2             1         8
3             2         1
4             3         7
5             9         5
6             7         4
7             7         5
8             7         8
9             6         3
10          NaN         5

In [41]: df = pandas.DataFrame.from_records(documents, columns=['order_id', 'quantity', 'price'], index='order_id')

In [42]: df
Out[42]:
    quantity  price
0          2      2
1          3      2
2          8      1
3          1      2
4          7      3
5          5      9
6          4      7
7          5      7
8          8      7
9          3      6
10         5    NaN

Last one seems not to set the index name.

wesm · 2012-09-18T20:34:13Z

I fixed DataFrame.from_records so that it sets the index name or names. @blais you should use DataFrame.from_records for this purpose

wesm closed this as completed in dfe7a55 Sep 18, 2012

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support creating a DataFrame from a list of dictionaries providing an index column #1744

Support creating a DataFrame from a list of dictionaries providing an index column #1744

blais commented Aug 8, 2012

lodagro commented Aug 8, 2012

wesm commented Sep 18, 2012

Support creating a DataFrame from a list of dictionaries providing an index column #1744

Support creating a DataFrame from a list of dictionaries providing an index column #1744

Comments

blais commented Aug 8, 2012

lodagro commented Aug 8, 2012

wesm commented Sep 18, 2012