Skip to content

ENH: Row-wise dataframe builder #50582

Closed
Closed
@clbarnes

Description

@clbarnes

Feature Type

  • Adding new functionality to pandas

  • Changing existing functionality in pandas

  • Removing existing functionality in pandas

Problem Description

I would like to build a dataframe row-by-row. This is a common use case. It is common for data to be ingested or calculated row-wise, but does not fit well with columnar memory models. The current solution (building from a list of lists) does not allow you to specify the dtypes of each column, just one dtype for the whole dataframe.

Feature Description

Add a new class, DataFrameBuilder (working title), whose constructor takes a list of columns and optionally a list of dtypes for those columns. Rows can then be added one by one in a memory-efficient way (i.e. not realising intermediate dataframes). Once all rows are added, a .build() method would realise the final dataframe with the correct column dtypes.

Alternative Solutions

The DataFrame constructor could take multiple dtypes, but this would further complicate an already extremely complicated class. Users could construct a dataframe of dtype object from a list of lists, and then manually cast each column's dtype afterwards (this could be an implementation of the DataFrameBuilder).

Additional Context

I wrote this implementation, which is small and maintainable, and it's become indispensable in practically every project where I use pandas: https://gist.github.com/clbarnes/9c53701e8b436603d236ac0ab26ff8a9

Metadata

Metadata

Assignees

No one assigned

    Labels

    EnhancementNeeds TriageIssue that has not been reviewed by a pandas team member

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions