Skip to content

BUG: pd.concat(..., copy=False) still causes copy on block consolidation. #34825

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
DamianBarabonkovQC opened this issue Jun 16, 2020 · 1 comment
Labels
Bug Reshaping Concat, Merge/Join, Stack/Unstack, Explode

Comments

@DamianBarabonkovQC
Copy link
Contributor

DamianBarabonkovQC commented Jun 16, 2020

Problem description

When concatenating columns of the same dtype, even with copy=False option, the columns are consolidated together which involves a copy and a costly vstack. The performance is actually worse for copy=False than the default copy=True which is misleading.

There are use cases where consolidated data is not required for an application, so this unneeded performance penalty is undesired.

Sample Program

import time
import pandas as pd

template_series = pd.Series(list(range(10000)))

series_ls = []
for i in range(1000):
    series_ls.append(template_series.copy())

start_time = time.time()
df_no_copy = pd.concat(series_ls, copy=False)
print("No copy elapsed", time.time() - start_time)

start_time = time.time()
df_copy = pd.concat(series_ls, copy=True) # The default setting
print("Copy elapsed", time.time() - start_time)

Execution Time

No copy elapsed 0.07740044593811035
Copy elapsed 0.05434751510620117

Execution time is in seconds.

Problem Trace

The consolidation occurs as a result of:

if not self.copy:
    new_data._consolidate_inplace()

located near "pandas/core/reshape/concat.py:499".

@DamianBarabonkovQC DamianBarabonkovQC added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Jun 16, 2020
@TomAugspurger
Copy link
Contributor

xref #34683 (and cc @jbrockmendel).

These kinds of changes can have non-obvious behavior changes to downstream operations so we need to be careful. But given that this is with copy=False I think changing this is relatively safe.

@TomAugspurger TomAugspurger added Reshaping Concat, Merge/Join, Stack/Unstack, Explode and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Jun 16, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

No branches or pull requests

2 participants