BUG: df.setitem can be 10x slower than pd.concat(..., axis=1) #37954

ivirshup · 2020-11-19T10:48:50Z

I have checked that this issue has not already been reported.
- I think so, it's a bit hard to search for though. Nothing under concat and setindex
I have confirmed this bug exists on the latest version of pandas.
(optional) I have confirmed this bug exists on the master branch of pandas.

Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.

Code Sample, a copy-pastable example

import pandas as pd, numpy as np
from string import ascii_lowercase

def setitem(x, x_cols, df):
    new = pd.DataFrame(index=df.index)
    new[x_cols] = x
    new[df.columns] = df
    return new


def concat(x, x_cols, df):
    return pd.concat(
        [
            pd.DataFrame(x, columns=x_cols, index=df.index),
            df,
        ],
        axis=1,
    )

x = np.ones((1000, 10))
x_col = list(ascii_lowercase[:10])
df = pd.DataFrame(
    {
        "str": np.random.choice(np.array(list(ascii_lowercase)), size=1000),
        "int": np.arange(1000, dtype=int),
    }
)


%timeit setitem(x, x_col, df)
# 3.78 ms ± 193 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

%timeit concat(x, x_col, df)
# 306 µs ± 9.29 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

Problem description

This seems unintuitive from a performance perspective. I would assume that these would be close to equivalent. The setitem implementation might even be expected to have better performance due to less allocation.

While this is a stripped down example, the use case is building a dataframe to return from a function. Making a dataframe, then adding columns seemed like the natural idiom here.

Output of `pd.show_versions()`

INSTALLED VERSIONS
------------------
commit           : 67a3d4241ab84419856b84fc3ebc9abcbe66c6b3
python           : 3.8.5.final.0
python-bits      : 64
OS               : Darwin
OS-release       : 19.6.0
Version          : Darwin Kernel Version 19.6.0: Thu Oct 29 22:56:45 PDT 2020; root:xnu-6153.141.2.2~1/RELEASE_X86_64
machine          : x86_64
processor        : i386
byteorder        : little
LC_ALL           : None
LANG             : en_US.UTF-8
LOCALE           : en_US.UTF-8

pandas           : 1.1.4
numpy            : 1.19.4
pytz             : 2020.1
dateutil         : 2.8.1
pip              : 20.2.4
setuptools       : 50.3.2
Cython           : 0.29.21
pytest           : 6.1.2
hypothesis       : None
sphinx           : 3.2.1
blosc            : None
feather          : None
xlsxwriter       : None
lxml.etree       : None
html5lib         : None
pymysql          : None
psycopg2         : None
jinja2           : 2.11.2
IPython          : 7.19.0
pandas_datareader: None
bs4              : None
bottleneck       : None
fsspec           : 0.8.4
fastparquet      : None
gcsfs            : None
matplotlib       : 3.3.3
numexpr          : 2.7.1
odfpy            : None
openpyxl         : None
pandas_gbq       : None
pyarrow          : 2.0.0
pytables         : None
pyxlsb           : None
s3fs             : None
scipy            : 1.5.4
sqlalchemy       : 1.3.18
tables           : 3.6.1
tabulate         : 0.8.7
xarray           : 0.16.1
xlrd             : 1.2.0
xlwt             : None
numba            : 0.51.2

The text was updated successfully, but these errors were encountered:

phofl · 2020-11-19T22:28:39Z

Hi, thanks for your report. We might have a different problem here.

The setitem example raises on master for me.

Traceback (most recent call last):
  File "/home/developer/.config/JetBrains/PyCharm2020.2/scratches/scratch_4.py", line 167, in <module>
    setitem(x, x_col, df)
  File "/home/developer/.config/JetBrains/PyCharm2020.2/scratches/scratch_4.py", line 142, in setitem
    new[x_cols] = x
  File "/home/developer/PycharmProjects/pandas/pandas/core/frame.py", line 3103, in __setitem__
    self._setitem_array(key, value)
  File "/home/developer/PycharmProjects/pandas/pandas/core/frame.py", line 3133, in _setitem_array
    self.loc._ensure_listlike_indexer(key, axis=1, value=value)
  File "/home/developer/PycharmProjects/pandas/pandas/core/indexing.py", line 671, in _ensure_listlike_indexer
    self.obj[k] = value[i]
  File "/home/developer/PycharmProjects/pandas/pandas/core/frame.py", line 3106, in __setitem__
    self._set_item(key, value)
  File "/home/developer/PycharmProjects/pandas/pandas/core/frame.py", line 3182, in _set_item
    value = self._sanitize_column(key, value)
  File "/home/developer/PycharmProjects/pandas/pandas/core/frame.py", line 3839, in _sanitize_column
    value = sanitize_index(value, self.index)
  File "/home/developer/PycharmProjects/pandas/pandas/core/internals/construction.py", line 750, in sanitize_index
    raise ValueError(
ValueError: Length of values (10) does not match length of index (1000)
1605824872.729447

Process finished with exit code 1

cc @jbrockmendel
The array seems to have the right dimensions, I think this should work?

Edit: As reported, works on 1.1.4

jbrockmendel · 2020-11-19T23:26:11Z

@phofl i get the same, good catch.

@ivirshup the bug in master notwithstanding, further investigations into the perf difference would be welcome. have you tried profiling the two versions (%prun -s cumtime your_function())

ivirshup · 2020-11-20T03:06:59Z

I'll take a closer look when I get a chance. I took some time to figure out what was going on, but made a wrong turn. Right now, it looks like the majority of the time is from new[x_cols] = x

new = pd.DataFrame(index=df.index)
%timeit new = pd.DataFrame(index=df.index); new[x_col] = x
3.26 ms ± 241 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

new = pd.DataFrame(index=df.index)
%timeit new = pd.DataFrame(index=df.index); new[df.columns] = df
636 µs ± 14.1 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

However, it looks like in both cases the code falls back to looping over the keys and adding the columns individually. Surprisingly (to me), for the array case pd.core.indexing._LocIndexer._ensure_listlike_indexer seems to actually be preallocating each columns (one at a time) as arrays of np.nan, regardless of what's going to be put there.

Additionally, it looks like new[df.columns] = df loops over the columns in df individually, instead of validating the index and shape once, then merging the managers.

pd.concat spends most of the time in concatenate_block_managers which is pretty fast.

profile of `concat`

         1277 function calls (1260 primitive calls) in 0.002 seconds

   Ordered by: cumulative time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    0.002    0.002 {built-in method builtins.exec}
        1    0.000    0.000    0.002    0.002 <string>:1(<module>)
        1    0.000    0.000    0.002    0.002 <ipython-input-2-43491c9c772d>:11(concat)
        1    0.000    0.000    0.001    0.001 concat.py:70(concat)
        2    0.000    0.000    0.001    0.000 frame.py:441(__init__)
        1    0.000    0.000    0.001    0.001 construction.py:143(init_ndarray)
        1    0.000    0.000    0.001    0.001 concat.py:295(__init__)
      4/2    0.000    0.000    0.000    0.000 base.py:293(__new__)
        1    0.000    0.000    0.000    0.000 concat.py:456(get_result)
        1    0.000    0.000    0.000    0.000 concat.py:517(_get_new_axes)
        1    0.000    0.000    0.000    0.000 concat.py:31(concatenate_block_managers)
        1    0.000    0.000    0.000    0.000 concat.py:519(<listcomp>)
        6    0.000    0.000    0.000    0.000 base.py:5559(ensure_index)
        1    0.000    0.000    0.000    0.000 construction.py:450(_get_axes)
        1    0.000    0.000    0.000    0.000 concat.py:534(_get_concat_axis)
        1    0.000    0.000    0.000    0.000 concat.py:591(_concat_indexes)
        1    0.000    0.000    0.000    0.000 base.py:4133(append)
        1    0.000    0.000    0.000    0.000 base.py:4161(_concat)
        3    0.000    0.000    0.000    0.000 {method 'copy' of 'numpy.ndarray' objects}
      287    0.000    0.000    0.000    0.000 {built-in method builtins.isinstance}
        1    0.000    0.000    0.000    0.000 managers.py:1651(create_block_manager_from_blocks)
        1    0.000    0.000    0.000    0.000 concat.py:110(concat_compat)
        1    0.000    0.000    0.000    0.000 concat.py:48(<listcomp>)
       27    0.000    0.000    0.000    0.000 base.py:256(is_dtype)
        2    0.000    0.000    0.000    0.000 concat.py:87(_get_mgr_concatenation_plan)
        1    0.000    0.000    0.000    0.000 {pandas._libs.lib.clean_index_list}
        2    0.000    0.000    0.000    0.000 managers.py:132(__init__)
        1    0.000    0.000    0.000    0.000 concat.py:524(_get_comb_axis)
      111    0.000    0.000    0.000    0.000 generic.py:10(_check)
       13    0.000    0.000    0.000    0.000 common.py:530(is_categorical_dtype)
        1    0.000    0.000    0.000    0.000 blocks.py:2701(make_block)
        2    0.000    0.000    0.000    0.000 base.py:5726(_maybe_cast_data_without_dtype)
        1    0.000    0.000    0.000    0.000 api.py:65(get_objs_combined_axis)
       17    0.000    0.000    0.000    0.000 common.py:1460(is_extension_array_dtype)
        1    0.000    0.000    0.000    0.000 concat.py:29(get_dtype_kinds)
        2    0.000    0.000    0.000    0.000 managers.py:321(_verify_integrity)
        9    0.000    0.000    0.000    0.000 common.py:492(is_interval_dtype)
        9    0.000    0.000    0.000    0.000 common.py:456(is_period_dtype)
    82/67    0.000    0.000    0.000    0.000 {built-in method builtins.len}
        4    0.000    0.000    0.000    0.000 common.py:1330(is_bool_dtype)
        1    0.000    0.000    0.000    0.000 api.py:109(_get_combined_index)
        3    0.000    0.000    0.000    0.000 blocks.py:256(make_block_same_class)
       21    0.000    0.000    0.000    0.000 common.py:1600(_is_dtype_type)
        1    0.000    0.000    0.000    0.000 blocks.py:2655(get_block_type)
        7    0.000    0.000    0.000    0.000 dtypes.py:1119(is_dtype)
        4    0.000    0.000    0.000    0.000 managers.py:212(shape)
        7    0.000    0.000    0.000    0.000 dtypes.py:906(is_dtype)
        4    0.000    0.000    0.000    0.000 blocks.py:124(__init__)
        2    0.000    0.000    0.000    0.000 {pandas._libs.lib.infer_dtype}
        2    0.000    0.000    0.000    0.000 generic.py:5216(_consolidate)
        4    0.000    0.000    0.000    0.000 common.py:218(asarray_tuplesafe)
       17    0.000    0.000    0.000    0.000 base.py:413(find)
      165    0.000    0.000    0.000    0.000 {built-in method builtins.getattr}
        1    0.000    0.000    0.000    0.000 range.py:404(copy)
        1    0.000    0.000    0.000    0.000 cast.py:1570(construct_1d_object_array_from_listlike)
        2    0.000    0.000    0.000    0.000 base.py:5672(_maybe_cast_with_dtype)
        5    0.000    0.000    0.000    0.000 base.py:5656(maybe_extract_name)
       12    0.000    0.000    0.000    0.000 managers.py:214(<genexpr>)
        6    0.000    0.000    0.000    0.000 _asarray.py:14(asarray)
        2    0.000    0.000    0.000    0.000 generic.py:5208(_consolidate_inplace)
        8    0.000    0.000    0.000    0.000 {built-in method numpy.array}
        6    0.000    0.000    0.000    0.000 common.py:1296(is_float_dtype)
        3    0.000    0.000    0.000    0.000 _dtype.py:321(_name_get)
        1    0.000    0.000    0.000    0.000 range.py:116(from_range)
       10    0.000    0.000    0.000    0.000 common.py:422(is_timedelta64_dtype)
        1    0.000    0.000    0.000    0.000 <__array_function__ internals>:2(concatenate)
        4    0.000    0.000    0.000    0.000 common.py:750(is_signed_integer_dtype)
        2    0.000    0.000    0.000    0.000 {built-in method builtins.any}
        1    0.000    0.000    0.000    0.000 managers.py:977(_consolidate_inplace)
        4    0.000    0.000    0.000    0.000 concat.py:525(_combine_concat_plans)
        3    0.000    0.000    0.000    0.000 common.py:381(is_datetime64tz_dtype)
        4    0.000    0.000    0.000    0.000 common.py:1565(_get_dtype)
        5    0.000    0.000    0.000    0.000 <frozen importlib._bootstrap>:1017(_handle_fromlist)
        2    0.000    0.000    0.000    0.000 generic.py:5197(_protect_consolidate)
        2    0.000    0.000    0.000    0.000 base.py:463(_simple_new)
        3    0.000    0.000    0.000    0.000 common.py:194(is_object_dtype)
        1    0.000    0.000    0.000    0.000 concat.py:379(<listcomp>)
        3    0.000    0.000    0.000    0.000 common.py:224(is_sparse)
        1    0.000    0.000    0.000    0.000 range.py:134(_simple_new)
        1    0.000    0.000    0.000    0.000 {built-in method numpy.empty}
        4    0.000    0.000    0.000    0.000 common.py:806(is_unsigned_integer_dtype)
       56    0.000    0.000    0.000    0.000 {built-in method builtins.issubclass}
        1    0.000    0.000    0.000    0.000 {built-in method numpy.core._multiarray_umath.implement_array_function}
        4    0.000    0.000    0.000    0.000 {built-in method builtins.sum}
        2    0.000    0.000    0.000    0.000 generic.py:195(__init__)
        3    0.000    0.000    0.000    0.000 managers.py:675(is_consolidated)
        1    0.000    0.000    0.000    0.000 blocks.py:2374(__init__)
        2    0.000    0.000    0.000    0.000 generic.py:5211(f)
        1    0.000    0.000    0.000    0.000 {pandas._libs.internals.get_blkno_placements}
        2    0.000    0.000    0.000    0.000 frame.py:585(shape)
        1    0.000    0.000    0.000    0.000 construction.py:289(_prep_ndarray)
        1    0.000    0.000    0.000    0.000 api.py:95(_get_distinct_objs)
        2    0.000    0.000    0.000    0.000 common.py:696(is_integer_dtype)
        7    0.000    0.000    0.000    0.000 common.py:1733(pandas_dtype)
        3    0.000    0.000    0.000    0.000 concat.py:148(<genexpr>)
        1    0.000    0.000    0.000    0.000 api.py:91(<listcomp>)
        1    0.000    0.000    0.000    0.000 managers.py:683(_consolidate_check)
        4    0.000    0.000    0.000    0.000 blocks.py:237(mgr_locs)
        8    0.000    0.000    0.000    0.000 common.py:905(is_datetime64_any_dtype)
        9    0.000    0.000    0.000    0.000 base.py:567(__len__)
       21    0.000    0.000    0.000    0.000 {built-in method builtins.hasattr}
       11    0.000    0.000    0.000    0.000 common.py:180(<lambda>)
        6    0.000    0.000    0.000    0.000 managers.py:323(<genexpr>)
        2    0.000    0.000    0.000    0.000 managers.py:138(<listcomp>)
        5    0.000    0.000    0.000    0.000 inference.py:322(is_hashable)
       10    0.000    0.000    0.000    0.000 common.py:188(<lambda>)
        1    0.000    0.000    0.000    0.000 base.py:1182(name)
       15    0.000    0.000    0.000    0.000 blocks.py:233(mgr_locs)
        4    0.000    0.000    0.000    0.000 generic.py:471(ndim)
        8    0.000    0.000    0.000    0.000 managers.py:216(ndim)
        3    0.000    0.000    0.000    0.000 concat.py:174(__init__)
        6    0.000    0.000    0.000    0.000 range.py:687(__len__)
        4    0.000    0.000    0.000    0.000 blocks.py:135(_check_ndim)
        3    0.000    0.000    0.000    0.000 {method 'add' of 'pandas._libs.internals.BlockPlacement' objects}
        1    0.000    0.000    0.000    0.000 abc.py:96(__instancecheck__)
        2    0.000    0.000    0.000    0.000 concat.py:144(<genexpr>)
        2    0.000    0.000    0.000    0.000 generic.py:382(_get_block_manager_axis)
        2    0.000    0.000    0.000    0.000 generic.py:377(_get_axis)
        1    0.000    0.000    0.000    0.000 base.py:4165(<listcomp>)
       11    0.000    0.000    0.000    0.000 common.py:178(classes)
        1    0.000    0.000    0.000    0.000 concat.py:567(<listcomp>)
        2    0.000    0.000    0.000    0.000 generic.py:5141(__setattr__)
        1    0.000    0.000    0.000    0.000 base.py:4156(<setcomp>)
       10    0.000    0.000    0.000    0.000 common.py:183(classes_and_not_datetimelike)
        2    0.000    0.000    0.000    0.000 _validators.py:208(validate_bool_kwarg)
        1    0.000    0.000    0.000    0.000 concat.py:139(<listcomp>)
        4    0.000    0.000    0.000    0.000 frame.py:568(axes)
        5    0.000    0.000    0.000    0.000 generic.py:365(_get_axis_number)
        1    0.000    0.000    0.000    0.000 concat.py:511(_get_result_dim)
        3    0.000    0.000    0.000    0.000 _dtype.py:24(_kind_name)
        3    0.000    0.000    0.000    0.000 base.py:544(_reset_identity)
        1    0.000    0.000    0.000    0.000 managers.py:684(<listcomp>)
        4    0.000    0.000    0.000    0.000 blocks.py:311(shape)
        3    0.000    0.000    0.000    0.000 _dtype.py:307(_name_includes_bit_suffix)
        2    0.000    0.000    0.000    0.000 common.py:348(is_datetime64_dtype)
        2    0.000    0.000    0.000    0.000 range.py:452(equals)
        2    0.000    0.000    0.000    0.000 managers.py:961(consolidate)
        4    0.000    0.000    0.000    0.000 base.py:1175(name)
        8    0.000    0.000    0.000    0.000 {method 'append' of 'list' objects}
        2    0.000    0.000    0.000    0.000 managers.py:233(_is_single_block)
        3    0.000    0.000    0.000    0.000 {built-in method __new__ of type object at 0x104fd39d0}
        2    0.000    0.000    0.000    0.000 concat.py:128(is_nonempty)
        1    0.000    0.000    0.000    0.000 generic.py:5095(__finalize__)
        5    0.000    0.000    0.000    0.000 {method 'add' of 'set' objects}
        5    0.000    0.000    0.000    0.000 {built-in method builtins.hash}
        1    0.000    0.000    0.000    0.000 {built-in method _abc._abc_instancecheck}
        1    0.000    0.000    0.000    0.000 {method 'startswith' of 'str' objects}
        1    0.000    0.000    0.000    0.000 common.py:176(not_none)
        3    0.000    0.000    0.000    0.000 common.py:180(<genexpr>)
        2    0.000    0.000    0.000    0.000 managers.py:259(items)
        3    0.000    0.000    0.000    0.000 {built-in method builtins.id}
        2    0.000    0.000    0.000    0.000 base.py:3870(_values)
        1    0.000    0.000    0.000    0.000 concat.py:147(<setcomp>)
        2    0.000    0.000    0.000    0.000 numeric.py:81(_validate_dtype)
        2    0.000    0.000    0.000    0.000 {method 'copy' of 'dict' objects}
        1    0.000    0.000    0.000    0.000 blocks.py:315(dtype)
        2    0.000    0.000    0.000    0.000 {pandas._libs.lib.is_bool}
        1    0.000    0.000    0.000    0.000 frame.py:421(_constructor)
        2    0.000    0.000    0.000    0.000 {method 'items' of 'dict' objects}
        1    0.000    0.000    0.000    0.000 managers.py:179(blklocs)
        1    0.000    0.000    0.000    0.000 managers.py:163(blknos)
        1    0.000    0.000    0.000    0.000 concat.py:584(_maybe_check_integrity)
        2    0.000    0.000    0.000    0.000 {method 'pop' of 'dict' objects}
        1    0.000    0.000    0.000    0.000 {pandas._libs.lib.is_iterator}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
        1    0.000    0.000    0.000    0.000 multiarray.py:143(concatenate)

profile of `setitem`

 13521 function calls (13316 primitive calls) in 0.010 seconds

   Ordered by: cumulative time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    0.010    0.010 {built-in method builtins.exec}
        1    0.000    0.000    0.010    0.010 <string>:1(<module>)
        1    0.000    0.000    0.010    0.010 <ipython-input-2-43491c9c772d>:4(setitem)
     14/2    0.000    0.000    0.009    0.005 frame.py:3028(__setitem__)
        2    0.000    0.000    0.009    0.005 frame.py:3053(_setitem_array)
       12    0.000    0.000    0.008    0.001 frame.py:3109(_set_item)
        1    0.000    0.000    0.007    0.007 indexing.py:627(_ensure_listlike_indexer)
       12    0.000    0.000    0.007    0.001 generic.py:3572(_set_item)
       12    0.000    0.000    0.006    0.001 managers.py:1162(insert)
    55/27    0.001    0.000    0.004    0.000 base.py:293(__new__)
       12    0.000    0.000    0.004    0.000 base.py:5237(insert)
       12    0.000    0.000    0.002    0.000 base.py:3980(_coerce_scalar_to_index)
   149/99    0.001    0.000    0.001    0.000 {built-in method numpy.core._multiarray_umath.implement_array_function}
     3269    0.001    0.000    0.001    0.000 {built-in method builtins.isinstance}
       12    0.000    0.000    0.001    0.000 managers.py:2008(_fast_count_smallints)
        1    0.000    0.000    0.001    0.001 indexing.py:1523(_setitem_with_indexer)
       12    0.000    0.000    0.001    0.000 index_tricks.py:316(__getitem__)
       50    0.000    0.000    0.001    0.000 <__array_function__ internals>:2(concatenate)
      316    0.000    0.000    0.001    0.000 base.py:256(is_dtype)
       12    0.000    0.000    0.001    0.000 frame.py:3702(_sanitize_column)
       24    0.000    0.000    0.001    0.000 <__array_function__ internals>:2(append)
       24    0.000    0.000    0.001    0.000 function_base.py:4616(append)
        1    0.000    0.000    0.001    0.001 indexing.py:1208(_get_listlike_indexer)
       14    0.000    0.000    0.001    0.000 blocks.py:2701(make_block)
        2    0.000    0.000    0.000    0.000 generic.py:5197(_protect_consolidate)
        1    0.000    0.000    0.000    0.000 generic.py:5238(_is_mixed_type)
        2    0.000    0.000    0.000    0.000 managers.py:977(_consolidate_inplace)
        1    0.000    0.000    0.000    0.000 generic.py:5240(<lambda>)
        1    0.000    0.000    0.000    0.000 managers.py:688(is_mixed_type)
     1199    0.000    0.000    0.000    0.000 generic.py:10(_check)
      126    0.000    0.000    0.000    0.000 common.py:492(is_interval_dtype)
      126    0.000    0.000    0.000    0.000 common.py:456(is_period_dtype)
        1    0.000    0.000    0.000    0.000 frame.py:441(__init__)
        1    0.000    0.000    0.000    0.000 construction.py:237(init_dict)
  210/197    0.000    0.000    0.000    0.000 {built-in method numpy.array}
      191    0.000    0.000    0.000    0.000 common.py:1460(is_extension_array_dtype)
        1    0.000    0.000    0.000    0.000 managers.py:1889(_consolidate)
       56    0.000    0.000    0.000    0.000 common.py:218(asarray_tuplesafe)
      127    0.000    0.000    0.000    0.000 common.py:530(is_categorical_dtype)
       97    0.000    0.000    0.000    0.000 dtypes.py:1119(is_dtype)
       14    0.000    0.000    0.000    0.000 blocks.py:2655(get_block_type)
       97    0.000    0.000    0.000    0.000 dtypes.py:906(is_dtype)
       28    0.000    0.000    0.000    0.000 {pandas._libs.lib.infer_dtype}
        1    0.000    0.000    0.000    0.000 managers.py:533(setitem)
        1    0.000    0.000    0.000    0.000 managers.py:366(apply)
        1    0.000    0.000    0.000    0.000 blocks.py:782(setitem)
        9    0.000    0.000    0.000    0.000 base.py:5559(ensure_index)
       12    0.000    0.000    0.000    0.000 base.py:2851(get_loc)
       42    0.000    0.000    0.000    0.000 common.py:1330(is_bool_dtype)
       15    0.000    0.000    0.000    0.000 base.py:5726(_maybe_cast_data_without_dtype)
     1853    0.000    0.000    0.000    0.000 {built-in method builtins.getattr}
      212    0.000    0.000    0.000    0.000 common.py:1600(_is_dtype_type)
        1    0.000    0.000    0.000    0.000 base.py:4700(get_indexer_for)
        1    0.000    0.000    0.000    0.000 base.py:2957(get_indexer)
        1    0.000    0.000    0.000    0.000 base.py:3291(reindex)
    87/74    0.000    0.000    0.000    0.000 _asarray.py:14(asarray)
        1    0.000    0.000    0.000    0.000 managers.py:1906(_merge_blocks)
       48    0.000    0.000    0.000    0.000 _dtype.py:321(_name_get)
       27    0.000    0.000    0.000    0.000 base.py:5672(_maybe_cast_with_dtype)
       48    0.000    0.000    0.000    0.000 base.py:4036(__contains__)
      191    0.000    0.000    0.000    0.000 base.py:413(find)
       12    0.000    0.000    0.000    0.000 base.py:4976(_maybe_cast_indexer)
       12    0.000    0.000    0.000    0.000 numerictypes.py:569(find_common_type)
       55    0.000    0.000    0.000    0.000 base.py:5656(maybe_extract_name)
       10    0.000    0.000    0.000    0.000 generic.py:1719(__contains__)
       13    0.000    0.000    0.000    0.000 base.py:2000(inferred_type)
       12    0.000    0.000    0.000    0.000 base.py:1755(is_floating)
       20    0.000    0.000    0.000    0.000 managers.py:1894(<lambda>)
  424/335    0.000    0.000    0.000    0.000 {built-in method builtins.len}
       20    0.000    0.000    0.000    0.000 blocks.py:176(_consolidate_key)
       24    0.000    0.000    0.000    0.000 <__array_function__ internals>:2(ravel)
        3    0.000    0.000    0.000    0.000 managers.py:238(_rebuild_blknos_and_blklocs)
       10    0.000    0.000    0.000    0.000 cast.py:1310(maybe_cast_to_datetime)
       10    0.000    0.000    0.000    0.000 cast.py:1503(cast_scalar_to_array)
       69    0.000    0.000    0.000    0.000 common.py:1296(is_float_dtype)
       24    0.000    0.000    0.000    0.000 numerictypes.py:545(_can_coerce_all)
      120    0.000    0.000    0.000    0.000 common.py:422(is_timedelta64_dtype)
        3    0.000    0.000    0.000    0.000 common.py:97(is_bool_indexer)
       13    0.000    0.000    0.000    0.000 cast.py:1570(construct_1d_object_array_from_listlike)
       24    0.000    0.000    0.000    0.000 fromnumeric.py:1705(ravel)
       30    0.000    0.000    0.000    0.000 {built-in method numpy.empty}
       12    0.000    0.000    0.000    0.000 managers.py:163(blknos)
       13    0.000    0.000    0.000    0.000 <__array_function__ internals>:2(atleast_2d)
       14    0.000    0.000    0.000    0.000 blocks.py:124(__init__)
        1    0.000    0.000    0.000    0.000 {built-in method builtins.sorted}
       70    0.000    0.000    0.000    0.000 _asarray.py:86(asanyarray)
       42    0.000    0.000    0.000    0.000 common.py:750(is_signed_integer_dtype)
       47    0.000    0.000    0.000    0.000 common.py:1565(_get_dtype)
       24    0.000    0.000    0.000    0.000 common.py:381(is_datetime64tz_dtype)
      677    0.000    0.000    0.000    0.000 {built-in method builtins.issubclass}
       20    0.000    0.000    0.000    0.000 cast.py:643(infer_dtype_from_scalar)
       42    0.000    0.000    0.000    0.000 common.py:806(is_unsigned_integer_dtype)
       27    0.000    0.000    0.000    0.000 base.py:463(_simple_new)
        1    0.000    0.000    0.000    0.000 construction.py:60(arrays_to_mgr)
       48    0.000    0.000    0.000    0.000 _dtype.py:307(_name_includes_bit_suffix)
       13    0.000    0.000    0.000    0.000 shape_base.py:82(atleast_2d)
       27    0.000    0.000    0.000    0.000 common.py:696(is_integer_dtype)
        1    0.000    0.000    0.000    0.000 managers.py:1675(create_block_manager_from_arrays)
       81    0.000    0.000    0.000    0.000 common.py:1733(pandas_dtype)
      110    0.000    0.000    0.000    0.000 common.py:905(is_datetime64_any_dtype)
       12    0.000    0.000    0.000    0.000 base.py:554(_engine)
        1    0.000    0.000    0.000    0.000 missing.py:47(isna)
        1    0.000    0.000    0.000    0.000 missing.py:130(_isna)
       30    0.000    0.000    0.000    0.000 <frozen importlib._bootstrap>:1017(_handle_fromlist)
        2    0.000    0.000    0.000    0.000 base.py:4196(equals)
       13    0.000    0.000    0.000    0.000 {method 'astype' of 'numpy.ndarray' objects}
        1    0.000    0.000    0.000    0.000 <__array_function__ internals>:2(vstack)
        1    0.000    0.000    0.000    0.000 missing.py:193(_isna_ndarraylike)
        1    0.000    0.000    0.000    0.000 shape_base.py:223(vstack)
       16    0.000    0.000    0.000    0.000 {method 'fill' of 'numpy.ndarray' objects}
       14    0.000    0.000    0.000    0.000 indexing.py:2126(convert_to_index_sliceable)
       12    0.000    0.000    0.000    0.000 {method 'get_loc' of 'pandas._libs.index.IndexEngine' objects}
       20    0.000    0.000    0.000    0.000 numerictypes.py:360(issubdtype)
      111    0.000    0.000    0.000    0.000 common.py:188(<lambda>)
       24    0.000    0.000    0.000    0.000 <__array_function__ internals>:2(ndim)
        2    0.000    0.000    0.000    0.000 common.py:566(is_string_dtype)
        3    0.000    0.000    0.000    0.000 common.py:1541(_is_dtype)
        4    0.000    0.000    0.000    0.000 {method 'any' of 'numpy.ndarray' objects}
       17    0.000    0.000    0.000    0.000 {pandas._libs.lib.is_list_like}
      101    0.000    0.000    0.000    0.000 common.py:180(<lambda>)
       14    0.000    0.000    0.000    0.000 common.py:224(is_sparse)
        7    0.000    0.000    0.000    0.000 managers.py:212(shape)
        1    0.000    0.000    0.000    0.000 blocks.py:244(make_block)
       67    0.000    0.000    0.000    0.000 inference.py:322(is_hashable)
        5    0.000    0.000    0.000    0.000 {method 'reduce' of 'numpy.ufunc' objects}
       14    0.000    0.000    0.000    0.000 blocks.py:237(mgr_locs)
        2    0.000    0.000    0.000    0.000 common.py:595(condition)
        1    0.000    0.000    0.000    0.000 missing.py:358(array_equivalent)
        2    0.000    0.000    0.000    0.000 common.py:598(is_excluded_dtype)
      180    0.000    0.000    0.000    0.000 {built-in method builtins.hasattr}
       12    0.000    0.000    0.000    0.000 <__array_function__ internals>:2(bincount)
        2    0.000    0.000    0.000    0.000 frame.py:3722(reindexer)
        1    0.000    0.000    0.000    0.000 indexing.py:1257(_validate_read_indexer)
       10    0.000    0.000    0.000    0.000 inference.py:360(is_sequence)
       12    0.000    0.000    0.000    0.000 base.py:492(_get_attributes_dict)
        2    0.000    0.000    0.000    0.000 {built-in method builtins.any}
        4    0.000    0.000    0.000    0.000 _methods.py:53(_any)
       15    0.000    0.000    0.000    0.000 {built-in method builtins.all}
       34    0.000    0.000    0.000    0.000 generic.py:447(_info_axis)
        8    0.000    0.000    0.000    0.000 common.py:603(<genexpr>)
        2    0.000    0.000    0.000    0.000 managers.py:132(__init__)
       12    0.000    0.000    0.000    0.000 frame.py:1099(__len__)
        1    0.000    0.000    0.000    0.000 missing.py:235(_isna_string_dtype)
        1    0.000    0.000    0.000    0.000 managers.py:1715(form_blocks)
       96    0.000    0.000    0.000    0.000 numerictypes.py:554(<listcomp>)
       40    0.000    0.000    0.000    0.000 numerictypes.py:286(issubclass_)
       41    0.000    0.000    0.000    0.000 range.py:687(__len__)
       21    0.000    0.000    0.000    0.000 managers.py:214(<genexpr>)
      111    0.000    0.000    0.000    0.000 common.py:183(classes_and_not_datetimelike)
      101    0.000    0.000    0.000    0.000 common.py:178(classes)
       17    0.000    0.000    0.000    0.000 abc.py:96(__instancecheck__)
        2    0.000    0.000    0.000    0.000 frame.py:2869(__getitem__)
      115    0.000    0.000    0.000    0.000 {built-in method builtins.hash}
       27    0.000    0.000    0.000    0.000 {method 'ravel' of 'numpy.ndarray' objects}
       36    0.000    0.000    0.000    0.000 base.py:567(__len__)
       12    0.000    0.000    0.000    0.000 frame.py:3162(_ensure_valid_index)
        1    0.000    0.000    0.000    0.000 base.py:3194(_convert_listlike_indexer)
       13    0.000    0.000    0.000    0.000 base.py:573(__array__)
        1    0.000    0.000    0.000    0.000 {method 'sum' of 'numpy.ndarray' objects}
       12    0.000    0.000    0.000    0.000 base.py:496(<dictcomp>)
       24    0.000    0.000    0.000    0.000 {method 'transpose' of 'numpy.ndarray' objects}
        1    0.000    0.000    0.000    0.000 {method 'get_indexer' of 'pandas._libs.index.IndexEngine' objects}
        1    0.000    0.000    0.000    0.000 base.py:3216(_convert_arr_indexer)
        2    0.000    0.000    0.000    0.000 {method 'copy' of 'numpy.ndarray' objects}
       48    0.000    0.000    0.000    0.000 _dtype.py:24(_kind_name)
        1    0.000    0.000    0.000    0.000 <__array_function__ internals>:2(argsort)
       26    0.000    0.000    0.000    0.000 managers.py:216(ndim)
        1    0.000    0.000    0.000    0.000 managers.py:321(_verify_integrity)
       17    0.000    0.000    0.000    0.000 {built-in method _abc._abc_instancecheck}
       10    0.000    0.000    0.000    0.000 {built-in method builtins.iter}
        1    0.000    0.000    0.000    0.000 _methods.py:45(_sum)
        1    0.000    0.000    0.000    0.000 _asarray.py:221(require)
        3    0.000    0.000    0.000    0.000 managers.py:675(is_consolidated)
        1    0.000    0.000    0.000    0.000 base.py:4717(_maybe_promote)
       46    0.000    0.000    0.000    0.000 base.py:3870(_values)
       20    0.000    0.000    0.000    0.000 common.py:348(is_datetime64_dtype)
       27    0.000    0.000    0.000    0.000 base.py:544(_reset_identity)
       13    0.000    0.000    0.000    0.000 base.py:3896(_get_engine_target)
        1    0.000    0.000    0.000    0.000 fromnumeric.py:999(argsort)
        2    0.000    0.000    0.000    0.000 managers.py:683(_consolidate_check)
        1    0.000    0.000    0.000    0.000 {built-in method pandas._libs.missing.isnaobj}
       74    0.000    0.000    0.000    0.000 {method 'append' of 'list' objects}
       27    0.000    0.000    0.000    0.000 {built-in method __new__ of type object at 0x104fd39d0}
        3    0.000    0.000    0.000    0.000 {built-in method numpy.arange}
        1    0.000    0.000    0.000    0.000 managers.py:1921(<listcomp>)
       16    0.000    0.000    0.000    0.000 common.py:329(apply_if_callable)
       12    0.000    0.000    0.000    0.000 numerictypes.py:621(<listcomp>)
        1    0.000    0.000    0.000    0.000 base.py:1685(is_boolean)
       12    0.000    0.000    0.000    0.000 {method 'nonzero' of 'numpy.ndarray' objects}
       24    0.000    0.000    0.000    0.000 base.py:424(<genexpr>)
       12    0.000    0.000    0.000    0.000 common.py:149(cast_scalar_indexer)
       50    0.000    0.000    0.000    0.000 multiarray.py:143(concatenate)
        1    0.000    0.000    0.000    0.000 base.py:1646(is_unique)
        1    0.000    0.000    0.000    0.000 fromnumeric.py:52(_wrapfunc)
       11    0.000    0.000    0.000    0.000 indexing.py:655(<genexpr>)
        1    0.000    0.000    0.000    0.000 indexers.py:68(is_scalar_indexer)
        1    0.000    0.000    0.000    0.000 generic.py:3213(_maybe_update_cacher)
        1    0.000    0.000    0.000    0.000 generic.py:5208(_consolidate_inplace)
       24    0.000    0.000    0.000    0.000 fromnumeric.py:3075(ndim)
       20    0.000    0.000    0.000    0.000 {method 'format' of 'str' objects}
       14    0.000    0.000    0.000    0.000 blocks.py:135(_check_ndim)
        1    0.000    0.000    0.000    0.000 managers.py:156(from_blocks)
       12    0.000    0.000    0.000    0.000 managers.py:179(blklocs)
        1    0.000    0.000    0.000    0.000 blocks.py:2374(__init__)
       26    0.000    0.000    0.000    0.000 base.py:1175(name)
        1    0.000    0.000    0.000    0.000 missing.py:456(_array_equivalent_object)
        3    0.000    0.000    0.000    0.000 common.py:194(is_object_dtype)
        2    0.000    0.000    0.000    0.000 series.py:540(_values)
       25    0.000    0.000    0.000    0.000 managers.py:259(items)
        2    0.000    0.000    0.000    0.000 blocks.py:1925(_can_hold_element)
        2    0.000    0.000    0.000    0.000 base.py:1032(__iter__)
        3    0.000    0.000    0.000    0.000 generic.py:377(_get_axis)
        2    0.000    0.000    0.000    0.000 generic.py:5141(__setattr__)
        2    0.000    0.000    0.000    0.000 managers.py:684(<listcomp>)
        1    0.000    0.000    0.000    0.000 {method 'argsort' of 'numpy.ndarray' objects}
       30    0.000    0.000    0.000    0.000 blocks.py:315(dtype)
       14    0.000    0.000    0.000    0.000 generic.py:3609(_check_setitem_copy)
        2    0.000    0.000    0.000    0.000 inference.py:185(is_array_like)
       15    0.000    0.000    0.000    0.000 {pandas._libs.lib.is_scalar}
       32    0.000    0.000    0.000    0.000 {pandas._libs.lib.is_float}
        1    0.000    0.000    0.000    0.000 generic.py:5211(f)
        2    0.000    0.000    0.000    0.000 managers.py:1613(internal_values)
       27    0.000    0.000    0.000    0.000 blocks.py:233(mgr_locs)
        1    0.000    0.000    0.000    0.000 generic.py:3589(_check_is_chained_assignment_possible)
        2    0.000    0.000    0.000    0.000 managers.py:138(<listcomp>)
       24    0.000    0.000    0.000    0.000 fromnumeric.py:3071(_ndim_dispatcher)
       24    0.000    0.000    0.000    0.000 function_base.py:4612(_append_dispatcher)
        1    0.000    0.000    0.000    0.000 indexers.py:91(is_empty_indexer)
        2    0.000    0.000    0.000    0.000 blocks.py:2728(_extend_blocks)
       20    0.000    0.000    0.000    0.000 {pandas._libs.lib.is_bool}
       24    0.000    0.000    0.000    0.000 fromnumeric.py:1701(_ravel_dispatcher)
       14    0.000    0.000    0.000    0.000 base.py:590(dtype)
       22    0.000    0.000    0.000    0.000 {pandas._libs.lib.is_integer}
        2    0.000    0.000    0.000    0.000 generic.py:3250(_clear_item_cache)
        1    0.000    0.000    0.000    0.000 _asarray.py:298(<setcomp>)
        1    0.000    0.000    0.000    0.000 generic.py:3742(_is_view)
        2    0.000    0.000    0.000    0.000 generic.py:3532(_get_item_cache)
        1    0.000    0.000    0.000    0.000 generic.py:195(__init__)
        1    0.000    0.000    0.000    0.000 indexers.py:52(is_list_like_indexer)
       17    0.000    0.000    0.000    0.000 {built-in method builtins.callable}
       12    0.000    0.000    0.000    0.000 multiarray.py:852(bincount)
        1    0.000    0.000    0.000    0.000 indexing.py:2226(maybe_convert_ix)
        1    0.000    0.000    0.000    0.000 common.py:1265(is_string_like_dtype)
        1    0.000    0.000    0.000    0.000 missing.py:665(clean_reindex_fill_method)
        1    0.000    0.000    0.000    0.000 {built-in method pandas._libs.lib.is_bool_array}
        2    0.000    0.000    0.000    0.000 indexers.py:84(<genexpr>)
        1    0.000    0.000    0.000    0.000 managers.py:703(is_view)
        1    0.000    0.000    0.000    0.000 indexers.py:117(check_setitem_lengths)
       15    0.000    0.000    0.000    0.000 {pandas._libs.lib.is_iterator}
       13    0.000    0.000    0.000    0.000 shape_base.py:78(_atleast_2d_dispatcher)
       12    0.000    0.000    0.000    0.000 base.py:561(<lambda>)
        1    0.000    0.000    0.000    0.000 generic.py:471(ndim)
        2    0.000    0.000    0.000    0.000 base.py:520(is_)
        1    0.000    0.000    0.000    0.000 generic.py:5123(__getattr__)
        2    0.000    0.000    0.000    0.000 indexing.py:237(loc)
        1    0.000    0.000    0.000    0.000 common.py:608(is_dtype_equal)
       12    0.000    0.000    0.000    0.000 numerictypes.py:622(<listcomp>)
        2    0.000    0.000    0.000    0.000 cast.py:780(maybe_infer_dtype_type)
        3    0.000    0.000    0.000    0.000 generic.py:365(_get_axis_number)
        1    0.000    0.000    0.000    0.000 shape_base.py:219(_vhstack_dispatcher)
        2    0.000    0.000    0.000    0.000 frame.py:568(axes)
        1    0.000    0.000    0.000    0.000 {pandas._libs.lib.array_equivalent_object}
        3    0.000    0.000    0.000    0.000 managers.py:1680(<genexpr>)
        2    0.000    0.000    0.000    0.000 range.py:452(equals)
        2    0.000    0.000    0.000    0.000 managers.py:1575(_block)
        1    0.000    0.000    0.000    0.000 {built-in method builtins.sum}
        1    0.000    0.000    0.000    0.000 managers.py:1922(<listcomp>)
        1    0.000    0.000    0.000    0.000 managers.py:961(consolidate)
        1    0.000    0.000    0.000    0.000 shape_base.py:208(_arrays_for_stack_dispatcher)
        2    0.000    0.000    0.000    0.000 blocks.py:201(internal_values)
        2    0.000    0.000    0.000    0.000 {method 'startswith' of 'str' objects}
        1    0.000    0.000    0.000    0.000 base.py:5623(ensure_has_len)
        1    0.000    0.000    0.000    0.000 {method 'reshape' of 'numpy.ndarray' objects}
        1    0.000    0.000    0.000    0.000 indexing.py:100(iloc)
        3    0.000    0.000    0.000    0.000 base.py:1378(nlevels)
        2    0.000    0.000    0.000    0.000 {method 'get' of 'dict' objects}
        1    0.000    0.000    0.000    0.000 common.py:1293(<lambda>)
        1    0.000    0.000    0.000    0.000 blocks.py:180(is_view)
        2    0.000    0.000    0.000    0.000 {method 'clear' of 'dict' objects}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
        1    0.000    0.000    0.000    0.000 {pandas._libs.algos.ensure_platform_int}
        2    0.000    0.000    0.000    0.000 {pandas._libs.algos.ensure_object}
        2    0.000    0.000    0.000    0.000 {pandas._libs.lib.item_from_zerodim}
        1    0.000    0.000    0.000    0.000 missing.py:75(clean_fill_method)
        1    0.000    0.000    0.000    0.000 construction.py:329(_homogenize)
        1    0.000    0.000    0.000    0.000 {method 'upper' of 'str' objects}
        1    0.000    0.000    0.000    0.000 managers.py:385(<dictcomp>)
        1    0.000    0.000    0.000    0.000 construction.py:274(<listcomp>)
        1    0.000    0.000    0.000    0.000 managers.py:323(<genexpr>)
        1    0.000    0.000    0.000    0.000 fromnumeric.py:995(_argsort_dispatcher)
        1    0.000    0.000    0.000    0.000 {method 'keys' of 'dict' objects}
        1    0.000    0.000    0.000    0.000 base.py:637(ndim)
        1    0.000    0.000    0.000    0.000 base.py:3247(_convert_list_indexer)
        1    0.000    0.000    0.000    0.000 construction.py:277(<listcomp>)
        1    0.000    0.000    0.000    0.000 construction.py:280(<listcomp>)

jbrockmendel · 2020-11-21T01:18:38Z

Surprisingly (to me), for the array case pd.core.indexing._LocIndexer._ensure_listlike_indexer seems to actually be preallocating each columns (one at a time) as arrays of np.nan, regardless of what's going to be put there.

I found this surprising too a few weeks ago and changed it to avoid a double-allocation, but it looks like that is what caused the bug that @phofl is fixing for us. Even with that fix in, I think it will still be iterating over columns, so perf will leave something to be desired.

jbrockmendel · 2020-11-21T01:19:31Z

For the %prun i'd suggest doing the whole thing inside a for _ in range(1000): do_thing() to get some more significant digits in the output

ivirshup · 2020-11-27T06:09:47Z

Expanded prun results:

%%prun -s cumtime
for _ in range(1000):
    setitem(x, x_col, df)

         13518003 function calls (13313003 primitive calls) in 6.765 seconds

   Ordered by: cumulative time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    6.765    6.765 {built-in method builtins.exec}
        1    0.007    0.007    6.765    6.765 <string>:1(<module>)
     1000    0.004    0.000    6.758    0.007 <ipython-input-3-43491c9c772d>:4(setitem)
14000/2000    0.038    0.000    6.540    0.003 frame.py:3028(__setitem__)
     2000    0.017    0.000    6.527    0.003 frame.py:3053(_setitem_array)
    12000    0.037    0.000    5.347    0.000 frame.py:3109(_set_item)
    12000    0.062    0.000    4.789    0.000 generic.py:3572(_set_item)
     1000    0.019    0.000    4.667    0.005 indexing.py:627(_ensure_listlike_indexer)
    12000    0.146    0.000    4.493    0.000 managers.py:1162(insert)
55000/27000    0.642    0.000    3.299    0.000 base.py:293(__new__)
    12000    0.057    0.000    3.130    0.000 base.py:5237(insert)
    12000    0.037    0.000    1.439    0.000 base.py:3980(_coerce_scalar_to_index)
  3269000    0.625    0.000    1.013    0.000 {built-in method builtins.isinstance}
   316000    0.146    0.000    0.703    0.000 base.py:256(is_dtype)
    12000    0.052    0.000    0.527    0.000 managers.py:2008(_fast_count_smallints)
    12000    0.064    0.000    0.473    0.000 frame.py:3702(_sanitize_column)
     1000    0.016    0.000    0.453    0.000 indexing.py:1523(_setitem_with_indexer)
    12000    0.150    0.000    0.423    0.000 index_tricks.py:316(__getitem__)
149000/99000    0.186    0.000    0.419    0.000 {built-in method numpy.core._multiarray_umath.implement_array_function}
    14000    0.029    0.000    0.414    0.000 blocks.py:2701(make_block)
     1000    0.006    0.000    0.391    0.000 indexing.py:1208(_get_listlike_indexer)
  1199000    0.279    0.000    0.387    0.000 generic.py:10(_check)
   126000    0.069    0.000    0.377    0.000 common.py:492(is_interval_dtype)
   126000    0.064    0.000    0.367    0.000 common.py:456(is_period_dtype)
   191000    0.130    0.000    0.338    0.000 common.py:1460(is_extension_array_dtype)
     2000    0.003    0.000    0.322    0.000 generic.py:5197(_protect_consolidate)
     1000    0.001    0.000    0.319    0.000 generic.py:5238(_is_mixed_type)
     2000    0.010    0.000    0.316    0.000 managers.py:977(_consolidate_inplace)
     1000    0.001    0.000    0.315    0.000 generic.py:5240(<lambda>)
     1000    0.001    0.000    0.314    0.000 managers.py:688(is_mixed_type)
   127000    0.070    0.000    0.308    0.000 common.py:530(is_categorical_dtype)
    97000    0.076    0.000    0.294    0.000 dtypes.py:1119(is_dtype)
    14000    0.041    0.000    0.294    0.000 blocks.py:2655(get_block_type)
    56000    0.084    0.000    0.290    0.000 common.py:218(asarray_tuplesafe)
    97000    0.069    0.000    0.289    0.000 dtypes.py:906(is_dtype)
     1000    0.007    0.000    0.262    0.000 managers.py:1889(_consolidate)
     9000    0.005    0.000    0.261    0.000 base.py:5559(ensure_index)
    28000    0.184    0.000    0.255    0.000 {pandas._libs.lib.infer_dtype}
    24000    0.015    0.000    0.252    0.000 <__array_function__ internals>:2(append)
    42000    0.049    0.000    0.241    0.000 common.py:1330(is_bool_dtype)
210000/197000    0.215    0.000    0.227    0.000 {built-in method numpy.array}
    24000    0.038    0.000    0.216    0.000 function_base.py:4616(append)
     1000    0.003    0.000    0.215    0.000 frame.py:441(__init__)
     1000    0.004    0.000    0.210    0.000 construction.py:237(init_dict)
    12000    0.025    0.000    0.207    0.000 base.py:2851(get_loc)
    15000    0.034    0.000    0.206    0.000 base.py:5726(_maybe_cast_data_without_dtype)
   212000    0.109    0.000    0.200    0.000 common.py:1600(_is_dtype_type)
  1853000    0.187    0.000    0.191    0.000 {built-in method builtins.getattr}
     1000    0.002    0.000    0.173    0.000 base.py:4700(get_indexer_for)
     1000    0.006    0.000    0.172    0.000 base.py:2957(get_indexer)
   191000    0.112    0.000    0.170    0.000 base.py:413(find)
     1000    0.004    0.000    0.168    0.000 base.py:3291(reindex)
    50000    0.034    0.000    0.165    0.000 <__array_function__ internals>:2(concatenate)
    48000    0.075    0.000    0.164    0.000 _dtype.py:321(_name_get)
    27000    0.028    0.000    0.162    0.000 base.py:5672(_maybe_cast_with_dtype)
87000/74000    0.026    0.000    0.159    0.000 _asarray.py:14(asarray)
    12000    0.014    0.000    0.153    0.000 base.py:4976(_maybe_cast_indexer)
    12000    0.024    0.000    0.146    0.000 numerictypes.py:569(find_common_type)
    20000    0.008    0.000    0.137    0.000 managers.py:1894(<lambda>)
    13000    0.011    0.000    0.134    0.000 base.py:2000(inferred_type)
    12000    0.013    0.000    0.132    0.000 base.py:1755(is_floating)
    20000    0.033    0.000    0.129    0.000 blocks.py:176(_consolidate_key)
    48000    0.082    0.000    0.124    0.000 base.py:4036(__contains__)
    55000    0.039    0.000    0.120    0.000 base.py:5656(maybe_extract_name)
424000/335000    0.087    0.000    0.119    0.000 {built-in method builtins.len}
    69000    0.045    0.000    0.118    0.000 common.py:1296(is_float_dtype)
     1000    0.015    0.000    0.112    0.000 managers.py:1906(_merge_blocks)
    24000    0.080    0.000    0.112    0.000 numerictypes.py:545(_can_coerce_all)
    24000    0.013    0.000    0.107    0.000 <__array_function__ internals>:2(ravel)
   120000    0.060    0.000    0.104    0.000 common.py:422(is_timedelta64_dtype)
    10000    0.011    0.000    0.104    0.000 generic.py:1719(__contains__)
    10000    0.046    0.000    0.103    0.000 cast.py:1310(maybe_cast_to_datetime)
     3000    0.029    0.000    0.092    0.000 managers.py:238(_rebuild_blknos_and_blklocs)
     1000    0.001    0.000    0.089    0.000 managers.py:533(setitem)
     1000    0.005    0.000    0.088    0.000 managers.py:366(apply)
    13000    0.044    0.000    0.086    0.000 cast.py:1570(construct_1d_object_array_from_listlike)
    10000    0.013    0.000    0.085    0.000 cast.py:1503(cast_scalar_to_array)
     3000    0.007    0.000    0.082    0.000 common.py:97(is_bool_indexer)
    47000    0.037    0.000    0.082    0.000 common.py:1565(_get_dtype)
   677000    0.079    0.000    0.079    0.000 {built-in method builtins.issubclass}
    24000    0.023    0.000    0.079    0.000 fromnumeric.py:1705(ravel)
     1000    0.004    0.000    0.078    0.000 {built-in method builtins.sorted}
    24000    0.016    0.000    0.078    0.000 common.py:381(is_datetime64tz_dtype)
    30000    0.077    0.000    0.077    0.000 {built-in method numpy.empty}
    42000    0.029    0.000    0.077    0.000 common.py:750(is_signed_integer_dtype)
    20000    0.056    0.000    0.077    0.000 cast.py:643(infer_dtype_from_scalar)
    14000    0.030    0.000    0.076    0.000 blocks.py:124(__init__)
     1000    0.021    0.000    0.076    0.000 blocks.py:782(setitem)
    13000    0.010    0.000    0.075    0.000 <__array_function__ internals>:2(atleast_2d)
    42000    0.026    0.000    0.069    0.000 common.py:806(is_unsigned_integer_dtype)
    12000    0.005    0.000    0.064    0.000 managers.py:163(blknos)
    70000    0.018    0.000    0.063    0.000 _asarray.py:86(asanyarray)
    27000    0.040    0.000    0.062    0.000 base.py:463(_simple_new)
    48000    0.023    0.000    0.061    0.000 _dtype.py:307(_name_includes_bit_suffix)
    27000    0.023    0.000    0.061    0.000 common.py:696(is_integer_dtype)
    81000    0.042    0.000    0.060    0.000 common.py:1733(pandas_dtype)
   110000    0.044    0.000    0.057    0.000 common.py:905(is_datetime64_any_dtype)
    13000    0.037    0.000    0.053    0.000 shape_base.py:82(atleast_2d)
    30000    0.034    0.000    0.052    0.000 <frozen importlib._bootstrap>:1017(_handle_fromlist)
     1000    0.000    0.000    0.042    0.000 missing.py:47(isna)
     1000    0.001    0.000    0.042    0.000 missing.py:130(_isna)
   111000    0.027    0.000    0.040    0.000 common.py:188(<lambda>)
     1000    0.002    0.000    0.039    0.000 construction.py:60(arrays_to_mgr)
     1000    0.003    0.000    0.039    0.000 missing.py:193(_isna_ndarraylike)
     2000    0.005    0.000    0.039    0.000 base.py:4196(equals)
    20000    0.015    0.000    0.039    0.000 numerictypes.py:360(issubdtype)
    14000    0.017    0.000    0.038    0.000 indexing.py:2126(convert_to_index_sliceable)
    12000    0.019    0.000    0.037    0.000 base.py:554(_engine)
    14000    0.021    0.000    0.037    0.000 common.py:224(is_sparse)
     1000    0.004    0.000    0.036    0.000 managers.py:1675(create_block_manager_from_arrays)
   101000    0.023    0.000    0.035    0.000 common.py:180(<lambda>)
     2000    0.002    0.000    0.035    0.000 common.py:566(is_string_dtype)
     3000    0.002    0.000    0.035    0.000 common.py:1541(_is_dtype)
    24000    0.014    0.000    0.034    0.000 <__array_function__ internals>:2(ndim)
     1000    0.001    0.000    0.031    0.000 blocks.py:244(make_block)
    67000    0.021    0.000    0.031    0.000 inference.py:322(is_hashable)
     7000    0.012    0.000    0.031    0.000 managers.py:212(shape)
     2000    0.002    0.000    0.031    0.000 common.py:595(condition)
   180000    0.030    0.000    0.030    0.000 {built-in method builtins.hasattr}
    17000    0.014    0.000    0.029    0.000 {pandas._libs.lib.is_list_like}
     2000    0.002    0.000    0.029    0.000 common.py:598(is_excluded_dtype)
    12000    0.028    0.000    0.028    0.000 {method 'get_loc' of 'pandas._libs.index.IndexEngine' objects}
     1000    0.001    0.000    0.028    0.000 <__array_function__ internals>:2(vstack)
    14000    0.019    0.000    0.028    0.000 blocks.py:237(mgr_locs)
     2000    0.002    0.000    0.027    0.000 {built-in method builtins.any}
    12000    0.012    0.000    0.026    0.000 base.py:492(_get_attributes_dict)
     5000    0.026    0.000    0.026    0.000 {method 'reduce' of 'numpy.ufunc' objects}
     1000    0.003    0.000    0.025    0.000 missing.py:358(array_equivalent)
     1000    0.002    0.000    0.025    0.000 shape_base.py:223(vstack)
     8000    0.003    0.000    0.025    0.000 common.py:603(<genexpr>)
    12000    0.010    0.000    0.025    0.000 <__array_function__ internals>:2(bincount)
    15000    0.008    0.000    0.023    0.000 {built-in method builtins.all}
    34000    0.013    0.000    0.023    0.000 generic.py:447(_info_axis)
    40000    0.015    0.000    0.022    0.000 numerictypes.py:286(issubclass_)
     4000    0.003    0.000    0.022    0.000 {method 'any' of 'numpy.ndarray' objects}
    10000    0.012    0.000    0.022    0.000 inference.py:360(is_sequence)
    96000    0.022    0.000    0.022    0.000 numerictypes.py:554(<listcomp>)
    13000    0.020    0.000    0.020    0.000 {method 'astype' of 'numpy.ndarray' objects}
   111000    0.020    0.000    0.020    0.000 common.py:183(classes_and_not_datetimelike)
    12000    0.007    0.000    0.020    0.000 frame.py:1099(__len__)
    41000    0.013    0.000    0.019    0.000 range.py:687(__len__)
   101000    0.019    0.000    0.019    0.000 common.py:178(classes)
     1000    0.006    0.000    0.019    0.000 indexing.py:1257(_validate_read_indexer)
     4000    0.002    0.000    0.018    0.000 _methods.py:53(_any)
    21000    0.005    0.000    0.018    0.000 managers.py:214(<genexpr>)
    16000    0.018    0.000    0.018    0.000 {method 'fill' of 'numpy.ndarray' objects}
   115000    0.017    0.000    0.017    0.000 {built-in method builtins.hash}
    12000    0.006    0.000    0.017    0.000 frame.py:3162(_ensure_valid_index)
    27000    0.017    0.000    0.017    0.000 {method 'ravel' of 'numpy.ndarray' objects}
    36000    0.011    0.000    0.016    0.000 base.py:567(__len__)
    13000    0.008    0.000    0.016    0.000 base.py:573(__array__)
     2000    0.004    0.000    0.016    0.000 managers.py:132(__init__)
     2000    0.005    0.000    0.015    0.000 frame.py:3722(reindexer)
     2000    0.008    0.000    0.015    0.000 frame.py:2869(__getitem__)
     1000    0.009    0.000    0.015    0.000 managers.py:1715(form_blocks)
    17000    0.005    0.000    0.015    0.000 abc.py:96(__instancecheck__)
     1000    0.001    0.000    0.015    0.000 base.py:3194(_convert_listlike_indexer)
     1000    0.004    0.000    0.014    0.000 missing.py:235(_isna_string_dtype)
    12000    0.006    0.000    0.014    0.000 base.py:496(<dictcomp>)
    48000    0.014    0.000    0.014    0.000 _dtype.py:24(_kind_name)
    24000    0.013    0.000    0.013    0.000 {method 'transpose' of 'numpy.ndarray' objects}
     1000    0.001    0.000    0.013    0.000 base.py:3216(_convert_arr_indexer)
    26000    0.009    0.000    0.012    0.000 managers.py:216(ndim)
    20000    0.010    0.000    0.011    0.000 common.py:348(is_datetime64_dtype)
     3000    0.002    0.000    0.011    0.000 managers.py:675(is_consolidated)
    27000    0.011    0.000    0.011    0.000 base.py:544(_reset_identity)
     1000    0.002    0.000    0.011    0.000 base.py:4717(_maybe_promote)
     1000    0.001    0.000    0.010    0.000 {method 'sum' of 'numpy.ndarray' objects}
     1000    0.008    0.000    0.010    0.000 {method 'get_indexer' of 'pandas._libs.index.IndexEngine' objects}
    10000    0.010    0.000    0.010    0.000 {built-in method builtins.iter}
     1000    0.008    0.000    0.010    0.000 managers.py:1921(<listcomp>)
     1000    0.001    0.000    0.010    0.000 base.py:1685(is_boolean)
    17000    0.010    0.000    0.010    0.000 {built-in method _abc._abc_instancecheck}
    74000    0.010    0.000    0.010    0.000 {method 'append' of 'list' objects}
     2000    0.004    0.000    0.009    0.000 managers.py:683(_consolidate_check)
     1000    0.000    0.000    0.009    0.000 _methods.py:45(_sum)
    12000    0.009    0.000    0.009    0.000 numerictypes.py:621(<listcomp>)
    27000    0.009    0.000    0.009    0.000 {built-in method __new__ of type object at 0x109a8e9d0}
    46000    0.009    0.000    0.009    0.000 base.py:3870(_values)
    24000    0.007    0.000    0.008    0.000 base.py:424(<genexpr>)
    12000    0.008    0.000    0.008    0.000 {method 'nonzero' of 'numpy.ndarray' objects}
    13000    0.005    0.000    0.008    0.000 base.py:3896(_get_engine_target)
     1000    0.005    0.000    0.008    0.000 base.py:1646(is_unique)
    50000    0.008    0.000    0.008    0.000 multiarray.py:143(concatenate)
     1000    0.002    0.000    0.008    0.000 managers.py:321(_verify_integrity)
    12000    0.006    0.000    0.008    0.000 common.py:149(cast_scalar_indexer)
    16000    0.005    0.000    0.008    0.000 common.py:329(apply_if_callable)
     1000    0.001    0.000    0.007    0.000 <__array_function__ internals>:2(argsort)
     1000    0.004    0.000    0.007    0.000 _asarray.py:221(require)
    20000    0.007    0.000    0.007    0.000 {method 'format' of 'str' objects}
    11000    0.003    0.000    0.007    0.000 indexing.py:655(<genexpr>)
     1000    0.001    0.000    0.006    0.000 blocks.py:2374(__init__)
     3000    0.006    0.000    0.006    0.000 {built-in method numpy.arange}
     1000    0.006    0.000    0.006    0.000 {built-in method pandas._libs.missing.isnaobj}
    14000    0.006    0.000    0.006    0.000 blocks.py:135(_check_ndim)
     3000    0.002    0.000    0.006    0.000 common.py:194(is_object_dtype)
    26000    0.006    0.000    0.006    0.000 base.py:1175(name)
     1000    0.001    0.000    0.006    0.000 fromnumeric.py:999(argsort)
     1000    0.001    0.000    0.006    0.000 generic.py:3213(_maybe_update_cacher)
    24000    0.006    0.000    0.006    0.000 fromnumeric.py:3075(ndim)
     1000    0.001    0.000    0.005    0.000 managers.py:156(from_blocks)
     2000    0.003    0.000    0.005    0.000 blocks.py:1925(_can_hold_element)
    25000    0.005    0.000    0.005    0.000 managers.py:259(items)
     1000    0.001    0.000    0.005    0.000 generic.py:5208(_consolidate_inplace)
     2000    0.005    0.000    0.005    0.000 {method 'copy' of 'numpy.ndarray' objects}
    27000    0.005    0.000    0.005    0.000 blocks.py:233(mgr_locs)
     2000    0.004    0.000    0.005    0.000 base.py:1032(__iter__)
    30000    0.005    0.000    0.005    0.000 blocks.py:315(dtype)
    14000    0.005    0.000    0.005    0.000 generic.py:3609(_check_setitem_copy)
     2000    0.003    0.000    0.005    0.000 managers.py:684(<listcomp>)
    32000    0.005    0.000    0.005    0.000 {pandas._libs.lib.is_float}
     3000    0.003    0.000    0.005    0.000 generic.py:377(_get_axis)
     1000    0.001    0.000    0.005    0.000 fromnumeric.py:52(_wrapfunc)
    15000    0.004    0.000    0.004    0.000 {pandas._libs.lib.is_scalar}
     2000    0.001    0.000    0.004    0.000 series.py:540(_values)
     1000    0.002    0.000    0.004    0.000 missing.py:456(_array_equivalent_object)
    24000    0.004    0.000    0.004    0.000 fromnumeric.py:3071(_ndim_dispatcher)
     1000    0.002    0.000    0.004    0.000 indexers.py:68(is_scalar_indexer)
    12000    0.004    0.000    0.004    0.000 managers.py:179(blklocs)
    24000    0.004    0.000    0.004    0.000 function_base.py:4612(_append_dispatcher)
     2000    0.001    0.000    0.004    0.000 inference.py:185(is_array_like)
     2000    0.002    0.000    0.004    0.000 managers.py:138(<listcomp>)
     1000    0.004    0.000    0.004    0.000 {method 'argsort' of 'numpy.ndarray' objects}
    14000    0.003    0.000    0.003    0.000 base.py:590(dtype)
    24000    0.003    0.000    0.003    0.000 fromnumeric.py:1701(_ravel_dispatcher)
    20000    0.003    0.000    0.003    0.000 {pandas._libs.lib.is_bool}
     2000    0.003    0.000    0.003    0.000 generic.py:5141(__setattr__)
    22000    0.003    0.000    0.003    0.000 {pandas._libs.lib.is_integer}
     1000    0.001    0.000    0.003    0.000 generic.py:5211(f)
     2000    0.002    0.000    0.003    0.000 managers.py:1613(internal_values)
     1000    0.001    0.000    0.003    0.000 indexers.py:91(is_empty_indexer)
     1000    0.001    0.000    0.003    0.000 generic.py:3589(_check_is_chained_assignment_possible)
     2000    0.002    0.000    0.003    0.000 blocks.py:2728(_extend_blocks)
    17000    0.003    0.000    0.003    0.000 {built-in method builtins.callable}
    12000    0.002    0.000    0.002    0.000 multiarray.py:852(bincount)
     1000    0.001    0.000    0.002    0.000 indexing.py:2226(maybe_convert_ix)
    15000    0.002    0.000    0.002    0.000 {pandas._libs.lib.is_iterator}
     1000    0.001    0.000    0.002    0.000 common.py:1265(is_string_like_dtype)
     1000    0.001    0.000    0.002    0.000 indexers.py:52(is_list_like_indexer)
     1000    0.001    0.000    0.002    0.000 generic.py:3742(_is_view)
     1000    0.002    0.000    0.002    0.000 generic.py:195(__init__)
    13000    0.002    0.000    0.002    0.000 shape_base.py:78(_atleast_2d_dispatcher)
    12000    0.002    0.000    0.002    0.000 base.py:561(<lambda>)
     1000    0.002    0.000    0.002    0.000 {built-in method pandas._libs.lib.is_bool_array}
     2000    0.001    0.000    0.002    0.000 generic.py:3532(_get_item_cache)
    12000    0.002    0.000    0.002    0.000 numerictypes.py:622(<listcomp>)
     1000    0.001    0.000    0.002    0.000 _asarray.py:298(<setcomp>)
     2000    0.001    0.000    0.001    0.000 generic.py:3250(_clear_item_cache)
     1000    0.001    0.000    0.001    0.000 common.py:608(is_dtype_equal)
     2000    0.001    0.000    0.001    0.000 indexers.py:84(<genexpr>)
     2000    0.001    0.000    0.001    0.000 base.py:520(is_)
     2000    0.001    0.000    0.001    0.000 cast.py:780(maybe_infer_dtype_type)
     1000    0.001    0.000    0.001    0.000 managers.py:703(is_view)
     1000    0.001    0.000    0.001    0.000 generic.py:5123(__getattr__)
     3000    0.001    0.000    0.001    0.000 generic.py:365(_get_axis_number)
     1000    0.001    0.000    0.001    0.000 {pandas._libs.lib.array_equivalent_object}
     1000    0.001    0.000    0.001    0.000 generic.py:471(ndim)
     1000    0.000    0.000    0.001    0.000 shape_base.py:219(_vhstack_dispatcher)
     2000    0.001    0.000    0.001    0.000 range.py:452(equals)
     2000    0.001    0.000    0.001    0.000 frame.py:568(axes)
     3000    0.001    0.000    0.001    0.000 managers.py:1680(<genexpr>)
     1000    0.001    0.000    0.001    0.000 indexers.py:117(check_setitem_lengths)
     2000    0.001    0.000    0.001    0.000 indexing.py:237(loc)
     1000    0.001    0.000    0.001    0.000 managers.py:1922(<listcomp>)
     1000    0.001    0.000    0.001    0.000 missing.py:665(clean_reindex_fill_method)
     2000    0.001    0.000    0.001    0.000 managers.py:1575(_block)
     2000    0.001    0.000    0.001    0.000 {method 'startswith' of 'str' objects}
     1000    0.001    0.000    0.001    0.000 managers.py:961(consolidate)
     1000    0.001    0.000    0.001    0.000 {built-in method builtins.sum}
     1000    0.001    0.000    0.001    0.000 {method 'reshape' of 'numpy.ndarray' objects}
     1000    0.000    0.000    0.001    0.000 shape_base.py:208(_arrays_for_stack_dispatcher)
     1000    0.000    0.000    0.001    0.000 base.py:5623(ensure_has_len)
     2000    0.001    0.000    0.001    0.000 blocks.py:201(internal_values)
     1000    0.001    0.000    0.001    0.000 indexing.py:100(iloc)
     3000    0.001    0.000    0.001    0.000 base.py:1378(nlevels)
     2000    0.000    0.000    0.000    0.000 {method 'get' of 'dict' objects}
     2000    0.000    0.000    0.000    0.000 {pandas._libs.lib.item_from_zerodim}
     1000    0.000    0.000    0.000    0.000 common.py:1293(<lambda>)
     2000    0.000    0.000    0.000    0.000 {method 'clear' of 'dict' objects}
     1000    0.000    0.000    0.000    0.000 blocks.py:180(is_view)
     2000    0.000    0.000    0.000    0.000 {pandas._libs.algos.ensure_object}
     1000    0.000    0.000    0.000    0.000 construction.py:329(_homogenize)
     1000    0.000    0.000    0.000    0.000 {method 'upper' of 'str' objects}
     1000    0.000    0.000    0.000    0.000 missing.py:75(clean_fill_method)
     1000    0.000    0.000    0.000    0.000 {pandas._libs.algos.ensure_platform_int}
     1000    0.000    0.000    0.000    0.000 construction.py:274(<listcomp>)
     1000    0.000    0.000    0.000    0.000 {method 'keys' of 'dict' objects}
     1000    0.000    0.000    0.000    0.000 managers.py:323(<genexpr>)
     1000    0.000    0.000    0.000    0.000 base.py:637(ndim)
     1000    0.000    0.000    0.000    0.000 fromnumeric.py:995(_argsort_dispatcher)
     1000    0.000    0.000    0.000    0.000 managers.py:385(<dictcomp>)
     1000    0.000    0.000    0.000    0.000 base.py:3247(_convert_list_indexer)
     1000    0.000    0.000    0.000    0.000 construction.py:277(<listcomp>)
     1000    0.000    0.000    0.000    0.000 construction.py:280(<listcomp>)
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

%%prun -s cumtime
for _ in range(1000):
    concat(x, x_col, df)

         1274003 function calls (1257003 primitive calls) in 0.592 seconds

   Ordered by: cumulative time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    0.592    0.592 {built-in method builtins.exec}
        1    0.003    0.003    0.592    0.592 <string>:1(<module>)
     1000    0.004    0.000    0.589    0.001 <ipython-input-3-43491c9c772d>:11(concat)
     1000    0.003    0.000    0.359    0.000 concat.py:70(concat)
     1000    0.012    0.000    0.250    0.000 concat.py:295(__init__)
4000/2000    0.046    0.000    0.248    0.000 base.py:293(__new__)
     2000    0.005    0.000    0.229    0.000 frame.py:441(__init__)
     1000    0.005    0.000    0.220    0.000 construction.py:143(init_ndarray)
     1000    0.002    0.000    0.208    0.000 concat.py:517(_get_new_axes)
     1000    0.002    0.000    0.206    0.000 concat.py:519(<listcomp>)
     1000    0.002    0.000    0.184    0.000 concat.py:534(_get_concat_axis)
     1000    0.001    0.000    0.181    0.000 concat.py:591(_concat_indexes)
     1000    0.003    0.000    0.179    0.000 base.py:4133(append)
     1000    0.003    0.000    0.175    0.000 base.py:4161(_concat)
     6000    0.004    0.000    0.154    0.000 base.py:5559(ensure_index)
     1000    0.001    0.000    0.153    0.000 construction.py:450(_get_axes)
     1000    0.008    0.000    0.106    0.000 concat.py:456(get_result)
     1000    0.008    0.000    0.093    0.000 concat.py:31(concatenate_block_managers)
   287000    0.055    0.000    0.093    0.000 {built-in method builtins.isinstance}
    27000    0.012    0.000    0.059    0.000 base.py:256(is_dtype)
     1000    0.003    0.000    0.048    0.000 managers.py:1651(create_block_manager_from_blocks)
     1000    0.004    0.000    0.047    0.000 concat.py:110(concat_compat)
   111000    0.026    0.000    0.036    0.000 generic.py:10(_check)
     1000    0.001    0.000    0.034    0.000 concat.py:48(<listcomp>)
     2000    0.019    0.000    0.032    0.000 concat.py:87(_get_mgr_concatenation_plan)
    13000    0.007    0.000    0.032    0.000 common.py:530(is_categorical_dtype)
     2000    0.005    0.000    0.031    0.000 managers.py:132(__init__)
    17000    0.011    0.000    0.029    0.000 common.py:1460(is_extension_array_dtype)
     1000    0.005    0.000    0.029    0.000 concat.py:29(get_dtype_kinds)
     9000    0.005    0.000    0.027    0.000 common.py:492(is_interval_dtype)
     1000    0.002    0.000    0.027    0.000 blocks.py:2701(make_block)
     9000    0.004    0.000    0.026    0.000 common.py:456(is_period_dtype)
     2000    0.004    0.000    0.024    0.000 base.py:5726(_maybe_cast_data_without_dtype)
     1000    0.012    0.000    0.024    0.000 {pandas._libs.lib.clean_index_list}
     4000    0.005    0.000    0.023    0.000 common.py:1330(is_bool_dtype)
    21000    0.011    0.000    0.021    0.000 common.py:1600(_is_dtype_type)
     7000    0.006    0.000    0.021    0.000 dtypes.py:1119(is_dtype)
     2000    0.006    0.000    0.021    0.000 managers.py:321(_verify_integrity)
     7000    0.005    0.000    0.020    0.000 dtypes.py:906(is_dtype)
     1000    0.002    0.000    0.020    0.000 concat.py:524(_get_comb_axis)
     1000    0.003    0.000    0.020    0.000 blocks.py:2655(get_block_type)
82000/67000    0.014    0.000    0.018    0.000 {built-in method builtins.len}
     1000    0.001    0.000    0.017    0.000 api.py:65(get_objs_combined_axis)
     2000    0.011    0.000    0.016    0.000 {pandas._libs.lib.infer_dtype}
   165000    0.016    0.000    0.016    0.000 {built-in method builtins.getattr}
     4000    0.006    0.000    0.016    0.000 common.py:218(asarray_tuplesafe)
     4000    0.006    0.000    0.015    0.000 managers.py:212(shape)
    17000    0.010    0.000    0.015    0.000 base.py:413(find)
     3000    0.003    0.000    0.014    0.000 blocks.py:256(make_block_same_class)
     3000    0.014    0.000    0.014    0.000 {method 'copy' of 'numpy.ndarray' objects}
     1000    0.002    0.000    0.013    0.000 api.py:109(_get_combined_index)
     4000    0.007    0.000    0.013    0.000 blocks.py:124(__init__)
     2000    0.002    0.000    0.011    0.000 base.py:5672(_maybe_cast_with_dtype)
     5000    0.003    0.000    0.011    0.000 base.py:5656(maybe_extract_name)
     2000    0.002    0.000    0.010    0.000 generic.py:5216(_consolidate)
     6000    0.004    0.000    0.010    0.000 common.py:1296(is_float_dtype)
    12000    0.003    0.000    0.010    0.000 managers.py:214(<genexpr>)
     1000    0.002    0.000    0.009    0.000 range.py:404(copy)
     6000    0.002    0.000    0.009    0.000 _asarray.py:14(asarray)
     8000    0.009    0.000    0.009    0.000 {built-in method numpy.array}
     3000    0.002    0.000    0.008    0.000 common.py:381(is_datetime64tz_dtype)
    10000    0.005    0.000    0.008    0.000 common.py:422(is_timedelta64_dtype)
     4000    0.003    0.000    0.008    0.000 common.py:1565(_get_dtype)
     3000    0.002    0.000    0.008    0.000 common.py:194(is_object_dtype)
     4000    0.003    0.000    0.007    0.000 common.py:750(is_signed_integer_dtype)
     2000    0.001    0.000    0.007    0.000 {built-in method builtins.any}
     3000    0.005    0.000    0.007    0.000 _dtype.py:321(_name_get)
     2000    0.001    0.000    0.007    0.000 generic.py:5208(_consolidate_inplace)
     5000    0.005    0.000    0.007    0.000 <frozen importlib._bootstrap>:1017(_handle_fromlist)
     3000    0.004    0.000    0.007    0.000 common.py:224(is_sparse)
     1000    0.001    0.000    0.007    0.000 range.py:116(from_range)
     4000    0.002    0.000    0.007    0.000 common.py:806(is_unsigned_integer_dtype)
     1000    0.003    0.000    0.007    0.000 cast.py:1570(construct_1d_object_array_from_listlike)
    56000    0.006    0.000    0.006    0.000 {built-in method builtins.issubclass}
     1000    0.001    0.000    0.006    0.000 concat.py:379(<listcomp>)
     2000    0.002    0.000    0.006    0.000 generic.py:5197(_protect_consolidate)
     1000    0.002    0.000    0.006    0.000 range.py:134(_simple_new)
     4000    0.002    0.000    0.005    0.000 {built-in method builtins.sum}
     7000    0.003    0.000    0.005    0.000 common.py:1733(pandas_dtype)
     1000    0.001    0.000    0.005    0.000 <__array_function__ internals>:2(concatenate)
     3000    0.001    0.000    0.005    0.000 concat.py:148(<genexpr>)
     4000    0.003    0.000    0.005    0.000 concat.py:525(_combine_concat_plans)
     1000    0.001    0.000    0.004    0.000 blocks.py:2374(__init__)
     2000    0.003    0.000    0.004    0.000 base.py:463(_simple_new)
     2000    0.001    0.000    0.004    0.000 frame.py:585(shape)
     2000    0.002    0.000    0.004    0.000 common.py:696(is_integer_dtype)
     8000    0.003    0.000    0.004    0.000 common.py:905(is_datetime64_any_dtype)
    11000    0.003    0.000    0.004    0.000 common.py:180(<lambda>)
     2000    0.002    0.000    0.004    0.000 generic.py:5211(f)
     1000    0.004    0.000    0.004    0.000 {built-in method numpy.core._multiarray_umath.implement_array_function}
     9000    0.003    0.000    0.004    0.000 base.py:567(__len__)
    10000    0.002    0.000    0.003    0.000 common.py:188(<lambda>)
     1000    0.000    0.000    0.003    0.000 managers.py:977(_consolidate_inplace)
     3000    0.001    0.000    0.003    0.000 managers.py:675(is_consolidated)
     6000    0.002    0.000    0.003    0.000 managers.py:323(<genexpr>)
     2000    0.002    0.000    0.003    0.000 managers.py:138(<listcomp>)
    21000    0.003    0.000    0.003    0.000 {built-in method builtins.hasattr}
     1000    0.001    0.000    0.003    0.000 base.py:1182(name)
     2000    0.003    0.000    0.003    0.000 generic.py:195(__init__)
     4000    0.002    0.000    0.003    0.000 generic.py:471(ndim)
     1000    0.003    0.000    0.003    0.000 {built-in method numpy.empty}
     1000    0.001    0.000    0.003    0.000 api.py:91(<listcomp>)
     8000    0.002    0.000    0.003    0.000 managers.py:216(ndim)
     4000    0.002    0.000    0.003    0.000 blocks.py:237(mgr_locs)
     1000    0.002    0.000    0.003    0.000 api.py:95(_get_distinct_objs)
     1000    0.002    0.000    0.002    0.000 managers.py:683(_consolidate_check)
     6000    0.002    0.000    0.002    0.000 range.py:687(__len__)
     5000    0.002    0.000    0.002    0.000 inference.py:322(is_hashable)
    15000    0.002    0.000    0.002    0.000 blocks.py:233(mgr_locs)
    11000    0.002    0.000    0.002    0.000 common.py:178(classes)
     1000    0.001    0.000    0.002    0.000 construction.py:289(_prep_ndarray)
     2000    0.001    0.000    0.002    0.000 generic.py:382(_get_block_manager_axis)
     2000    0.001    0.000    0.002    0.000 generic.py:377(_get_axis)
    10000    0.002    0.000    0.002    0.000 common.py:183(classes_and_not_datetimelike)
     1000    0.001    0.000    0.002    0.000 concat.py:567(<listcomp>)
     4000    0.002    0.000    0.002    0.000 frame.py:568(axes)
     1000    0.001    0.000    0.002    0.000 concat.py:139(<listcomp>)
     4000    0.002    0.000    0.002    0.000 blocks.py:135(_check_ndim)
     1000    0.001    0.000    0.001    0.000 base.py:4165(<listcomp>)
     3000    0.001    0.000    0.001    0.000 {method 'add' of 'pandas._libs.internals.BlockPlacement' objects}
     2000    0.001    0.000    0.001    0.000 concat.py:144(<genexpr>)
     3000    0.001    0.000    0.001    0.000 base.py:544(_reset_identity)
     1000    0.001    0.000    0.001    0.000 base.py:4156(<setcomp>)
     2000    0.001    0.000    0.001    0.000 generic.py:5141(__setattr__)
     5000    0.001    0.000    0.001    0.000 generic.py:365(_get_axis_number)
     3000    0.001    0.000    0.001    0.000 concat.py:174(__init__)
     1000    0.000    0.000    0.001    0.000 concat.py:511(_get_result_dim)
     2000    0.001    0.000    0.001    0.000 managers.py:961(consolidate)
     2000    0.001    0.000    0.001    0.000 common.py:348(is_datetime64_dtype)
     4000    0.001    0.000    0.001    0.000 blocks.py:311(shape)
     3000    0.001    0.000    0.001    0.000 _dtype.py:24(_kind_name)
     8000    0.001    0.000    0.001    0.000 {method 'append' of 'list' objects}
     3000    0.001    0.000    0.001    0.000 _dtype.py:307(_name_includes_bit_suffix)
     2000    0.001    0.000    0.001    0.000 range.py:452(equals)
     3000    0.001    0.000    0.001    0.000 {built-in method __new__ of type object at 0x109a8e9d0}
     2000    0.001    0.000    0.001    0.000 managers.py:233(_is_single_block)
     1000    0.000    0.000    0.001    0.000 abc.py:96(__instancecheck__)
     2000    0.001    0.000    0.001    0.000 _validators.py:208(validate_bool_kwarg)
     2000    0.001    0.000    0.001    0.000 concat.py:128(is_nonempty)
     4000    0.001    0.000    0.001    0.000 base.py:1175(name)
     5000    0.001    0.000    0.001    0.000 {method 'add' of 'set' objects}
     1000    0.001    0.000    0.001    0.000 managers.py:684(<listcomp>)
     5000    0.001    0.000    0.001    0.000 {built-in method builtins.hash}
     3000    0.001    0.000    0.001    0.000 common.py:180(<genexpr>)
     1000    0.001    0.000    0.001    0.000 {built-in method _abc._abc_instancecheck}
     1000    0.001    0.000    0.001    0.000 {pandas._libs.internals.get_blkno_placements}
     1000    0.001    0.000    0.001    0.000 {method 'startswith' of 'str' objects}
     3000    0.000    0.000    0.000    0.000 {built-in method builtins.id}
     1000    0.000    0.000    0.000    0.000 common.py:176(not_none)
     1000    0.000    0.000    0.000    0.000 generic.py:5095(__finalize__)
     2000    0.000    0.000    0.000    0.000 managers.py:259(items)
     1000    0.000    0.000    0.000    0.000 concat.py:147(<setcomp>)
     2000    0.000    0.000    0.000    0.000 base.py:3870(_values)
     2000    0.000    0.000    0.000    0.000 numeric.py:81(_validate_dtype)
     2000    0.000    0.000    0.000    0.000 {method 'items' of 'dict' objects}
     2000    0.000    0.000    0.000    0.000 {pandas._libs.lib.is_bool}
     1000    0.000    0.000    0.000    0.000 managers.py:163(blknos)
     2000    0.000    0.000    0.000    0.000 {method 'copy' of 'dict' objects}
     2000    0.000    0.000    0.000    0.000 {method 'pop' of 'dict' objects}
     1000    0.000    0.000    0.000    0.000 concat.py:584(_maybe_check_integrity)
     1000    0.000    0.000    0.000    0.000 blocks.py:315(dtype)
     1000    0.000    0.000    0.000    0.000 managers.py:179(blklocs)
     1000    0.000    0.000    0.000    0.000 multiarray.py:143(concatenate)
     1000    0.000    0.000    0.000    0.000 {pandas._libs.lib.is_iterator}
     1000    0.000    0.000    0.000    0.000 frame.py:421(_constructor)
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

jbrockmendel · 2020-11-27T19:23:46Z

@phofl im thinking we should try to address this by avoiding the for-loop in _ensure_listlike_indexer, what do you think?

phofl · 2020-11-27T20:11:59Z

@jbrockmendel
Yes, that would probably be the easiest point to improve. Would cut this roughly in half. What would be the most performant way to add the missing keys all at once?

jbrockmendel · 2020-11-29T02:03:07Z

What would be the most performant way to add the missing keys all at once?

Three ideas come to mind:

use pd.concat directly, downside being that it may complicate reasoning about the flow-control in this code
implement a BlockManager method for inserting a bunch of columns at the end
half of the time in the %prun results are being spent in Index.insert, and most of that in Index.__new__. Maybe we can avoid doing just those calls in a loop and do Index.append instead

jreback · 2020-11-29T02:54:20Z

you need to reindex to the new key set

phofl · 2020-11-29T12:18:41Z

@jreback Wouldn't reindex create a copy instead of inplace modifications?

jbrockmendel · 2020-11-29T16:19:54Z

Wouldn't reindex create a copy instead of inplace modifications?

oh thats a good question. might be able to avoid it by passing only_slice=True to _slice_take_blocks_ax0, but would need to add that kwarg to reindex_indexer and reindex_axis

phofl · 2020-11-29T16:27:32Z

Does not seem to avoid a copy, but I think performance is high enough now there, that this does not matter that much. Thanks for explaining

simonjayhawkins · 2020-12-01T10:56:12Z

#38148 reverted by #38208

ivirshup added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Nov 19, 2020

jbrockmendel added Indexing Related to indexing on series/frames, not to indexes themselves Regression Functionality that used to work in a prior pandas version and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Nov 19, 2020

phofl mentioned this issue Nov 19, 2020

BUG: Bug in setitem raising ValueError when setting more than one column via array #37964

Merged

4 tasks

ivirshup mentioned this issue Nov 20, 2020

Speedup sc.get.obs_df scverse/scanpy#1499

Merged

phofl mentioned this issue Nov 29, 2020

ENH: Improve performance for df.__setitem__ with list-like indexers #38148

Merged

5 tasks

jreback added this to the 1.1.5 milestone Nov 29, 2020

jreback closed this as completed in #38148 Nov 29, 2020

jorisvandenbossche mentioned this issue Nov 30, 2020

Backport PR #38148: ENH: Improve performance for df.__setitem__ with list-like indexers #38181

Closed

simonjayhawkins reopened this Dec 1, 2020

simonjayhawkins modified the milestones: 1.1.5, 1.2 Dec 1, 2020

simonjayhawkins mentioned this issue Dec 1, 2020

Retain views with listlike indexers setitem #38204

Merged

1 task

simonjayhawkins removed the Regression Functionality that used to work in a prior pandas version label Dec 1, 2020

jreback closed this as completed in #38204 Dec 2, 2020

ivirshup mentioned this issue Jan 30, 2021

Allow plots to use adata.obs index as groupby scverse/scanpy#1583

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: df.setitem can be 10x slower than pd.concat(..., axis=1) #37954

BUG: df.setitem can be 10x slower than pd.concat(..., axis=1) #37954

ivirshup commented Nov 19, 2020 •

edited

Loading

phofl commented Nov 19, 2020 •

edited

Loading

jbrockmendel commented Nov 19, 2020

ivirshup commented Nov 20, 2020

jbrockmendel commented Nov 21, 2020

jbrockmendel commented Nov 21, 2020

ivirshup commented Nov 27, 2020

jbrockmendel commented Nov 27, 2020

phofl commented Nov 27, 2020

jbrockmendel commented Nov 29, 2020

jreback commented Nov 29, 2020

phofl commented Nov 29, 2020

jbrockmendel commented Nov 29, 2020

phofl commented Nov 29, 2020

simonjayhawkins commented Dec 1, 2020

BUG: df.__setitem__ can be 10x slower than pd.concat(..., axis=1) #37954

BUG: df.__setitem__ can be 10x slower than pd.concat(..., axis=1) #37954

Comments

ivirshup commented Nov 19, 2020 • edited Loading

Code Sample, a copy-pastable example

Problem description

Output of pd.show_versions()

phofl commented Nov 19, 2020 • edited Loading

jbrockmendel commented Nov 19, 2020

ivirshup commented Nov 20, 2020

jbrockmendel commented Nov 21, 2020

jbrockmendel commented Nov 21, 2020

ivirshup commented Nov 27, 2020

jbrockmendel commented Nov 27, 2020

phofl commented Nov 27, 2020

jbrockmendel commented Nov 29, 2020

jreback commented Nov 29, 2020

phofl commented Nov 29, 2020

jbrockmendel commented Nov 29, 2020

phofl commented Nov 29, 2020

simonjayhawkins commented Dec 1, 2020

BUG: df.setitem can be 10x slower than pd.concat(..., axis=1) #37954

BUG: df.setitem can be 10x slower than pd.concat(..., axis=1) #37954

ivirshup commented Nov 19, 2020 •

edited

Loading

Output of `pd.show_versions()`

phofl commented Nov 19, 2020 •

edited

Loading