-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
setting with enlargement fails for large DataFrames #10692
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
this is the same issue as in #10645 the cases for len > 1M have different handling and something is amuck. You know that you are copying the frame on enlargement right? This is extremely inefficient. |
@jreback What is the recommended way to do this? This exact way is mentioned in the docs and doesn't seem to be discouraged there: http://pandas.pydata.org/pandas-docs/stable/indexing.html#setting-with-enlargement |
what are you trying to do exactly? |
I'm not trying to do anything! Or maybe you are talking to the OP? I was actually wondering the same thing, as I would generally use But FWIW, these questions do come up at stack overflow with some regularity and if they found "setting with enlargement" in the documentation this is suggested as the way to do it. (or one of the ways, anyway). And in this case what OP did was pretty much identical to the last example in the "setting with enlargement" doc. |
@johne13 sorry, was on my phone. So enlargement is equivalent of |
Actually I didn't know that the df is copied with every enlargement anyway. About "what are you trying to do exactly?": I just have a huge DataFrame where I append some information when it's returned from functions etc. I probably have to do some redesigning. I guess the way to go is somehow to preallocate the rows in the main DataFrame or collecting the "stuff to be appended" in some smaller list/df first and then appending all in the end. |
@jreback commented on Jul 28, 2015
Jeff, I guess you didn't mean it's a "copy" of the original object in the sense of creating a brand new, unrelated object. If you just meant that a lot of data had to be copied underneath the hood, then I understand completely. Still, I'd guess it's quite different from append in that it manages to add a row in-place. (I didn't even know it's possible...)
Honestly, given the performance impact, I'm truly at a loss as to why "Setting with Enlargement" was added to the DataFrame API. |
Setting with enlargement seems to fail for DataFrames longer than
10**6 - 1
10**6
seems to be the exact treshold for me. That and anything bigger fails. Anything smaller works.Example:
pd.show_versions() returns:
The text was updated successfully, but these errors were encountered: