Skip to content

QST: Appending to pandas DataFrames #36281

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wumpus opened this issue Sep 11, 2020 · 1 comment
Closed

QST: Appending to pandas DataFrames #36281

wumpus opened this issue Sep 11, 2020 · 1 comment

Comments

@wumpus
Copy link

wumpus commented Sep 11, 2020

Like a lot of people, I wrote some code that attempted to call DataFrame.append() a few million times, and it was slow. So I read the pandas docs ("don't do that, use pd.concat()") and StackOverflow (blah blah blah Ginger) and decided that if I was going to write a function that made chunks of dataframe and then concatenated them, the least I could do would be to make it general purpose, and a package others could use.

Thus I created pandas-appender.

It has some features that I care about -- scalability to 100s of millions of rows, and features that help specify or identify categorical columns. But of course, it's not quite compatible with DataFrame.append(), and I'm new enough to pandas to be unsure if I've made it follow pandas conventions as far as possible.

So my question is, can someone who is better versed in pandas conventions than I am, preferably someone who would love to use a package that let them easily build up a dataframe with .append(), take a look at this code and give me some advice?

Thanks in advance!

@wumpus wumpus added Needs Triage Issue that has not been reviewed by a pandas team member Usage Question labels Sep 11, 2020
@dsaxton dsaxton removed the Needs Triage Issue that has not been reviewed by a pandas team member label Sep 11, 2020
@mroeschke
Copy link
Member

Sorry that this issue hasn't gotten a response.

As you mentioned, pandas encourages the use of pd.concat over append #35407. Generally, compiling all frame in a list and using pd.concat will be the most performant, even if written within a wrapper function. Going to close this issue, in terms of advertising the package or asking for further advice, we recommend using gitter or StackOverflow

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants