Description
Feature Type
-
Adding new functionality to pandas
-
Changing existing functionality in pandas
-
Removing existing functionality in pandas
Problem Description
I would like to build a dataframe row-by-row. This is a common use case. It is common for data to be ingested or calculated row-wise, but does not fit well with columnar memory models. The current solution (building from a list of lists) does not allow you to specify the dtypes of each column, just one dtype for the whole dataframe.
Feature Description
Add a new class, DataFrameBuilder (working title), whose constructor takes a list of columns and optionally a list of dtypes for those columns. Rows can then be added one by one in a memory-efficient way (i.e. not realising intermediate dataframes). Once all rows are added, a .build()
method would realise the final dataframe with the correct column dtypes.
Alternative Solutions
The DataFrame constructor could take multiple dtypes, but this would further complicate an already extremely complicated class. Users could construct a dataframe of dtype object from a list of lists, and then manually cast each column's dtype afterwards (this could be an implementation of the DataFrameBuilder).
Additional Context
I wrote this implementation, which is small and maintainable, and it's become indispensable in practically every project where I use pandas: https://gist.github.com/clbarnes/9c53701e8b436603d236ac0ab26ff8a9