Skip to content

PERF: always slice when indexing on columns #33597

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 63 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
63 commits
Select commit Hold shift + click to select a range
1697252
PERF: block-wise arithmetic for frame-with-frame
jbrockmendel Mar 17, 2020
a7764d6
Merge branch 'master' of https://github.com/pandas-dev/pandas into pe…
jbrockmendel Mar 17, 2020
30a836d
lint fixup
jbrockmendel Mar 17, 2020
3559698
Merge branch 'master' of https://github.com/pandas-dev/pandas into pe…
jbrockmendel Mar 18, 2020
4334353
troubleshoot npdev build
jbrockmendel Mar 18, 2020
cb40b0c
Merge branch 'master' of https://github.com/pandas-dev/pandas into pe…
jbrockmendel Mar 24, 2020
713a776
comment
jbrockmendel Mar 25, 2020
95ef3ad
checkpoint passing
jbrockmendel Mar 25, 2020
61e5cd6
checkpoint passing
jbrockmendel Mar 25, 2020
89c3d7b
refactor
jbrockmendel Mar 25, 2020
e348e46
blackify
jbrockmendel Mar 25, 2020
519c757
Merge branch 'master' of https://github.com/pandas-dev/pandas into pe…
jbrockmendel Mar 31, 2020
2b1ba18
disable assertions for perf
jbrockmendel Mar 31, 2020
53e93fc
Merge branch 'master' of https://github.com/pandas-dev/pandas into pe…
jbrockmendel Apr 1, 2020
91c86a3
asv
jbrockmendel Apr 1, 2020
2034084
whatsnew
jbrockmendel Apr 1, 2020
8aedf35
Merge branch 'master' of https://github.com/pandas-dev/pandas into pe…
jbrockmendel Apr 3, 2020
0c12d35
revert warning suppression
jbrockmendel Apr 3, 2020
9727562
Fixupm indentation
jbrockmendel Apr 3, 2020
6661dd3
Merge branch 'master' of https://github.com/pandas-dev/pandas into pe…
jbrockmendel Apr 4, 2020
42bbbf3
suppress warning
jbrockmendel Apr 4, 2020
65ab023
Merge branch 'master' of https://github.com/pandas-dev/pandas into pe…
jbrockmendel Apr 6, 2020
0d958a3
update asv
jbrockmendel Apr 6, 2020
7f91e74
Merge branch 'master' of https://github.com/pandas-dev/pandas into pe…
jbrockmendel Apr 6, 2020
56eef51
_data->_mgr
jbrockmendel Apr 6, 2020
4baea6f
Merge branch 'master' of https://github.com/pandas-dev/pandas into pe…
jbrockmendel Apr 7, 2020
41a4e7a
Merge branch 'master' of https://github.com/pandas-dev/pandas into pe…
jbrockmendel Apr 7, 2020
b23144e
Merge branch 'master' of https://github.com/pandas-dev/pandas into pe…
jbrockmendel Apr 9, 2020
7f24d57
Merge branch 'master' of https://github.com/pandas-dev/pandas into pe…
jbrockmendel Apr 9, 2020
ae744b7
update to use faspath constructor
jbrockmendel Apr 9, 2020
b14a98c
Merge branch 'master' of https://github.com/pandas-dev/pandas into pe…
jbrockmendel Apr 10, 2020
f42c403
update import
jbrockmendel Apr 10, 2020
8a2807e
remove unused import
jbrockmendel Apr 10, 2020
fa046f0
Merge branch 'master' of https://github.com/pandas-dev/pandas into pe…
jbrockmendel Apr 10, 2020
fd10fb6
rebase compat
jbrockmendel Apr 10, 2020
7ea5d3a
Merge branch 'master' of https://github.com/pandas-dev/pandas into pe…
jbrockmendel Apr 11, 2020
7150e87
slice instead of take
jbrockmendel Apr 12, 2020
bddfbb0
Merge branch 'master' of https://github.com/pandas-dev/pandas into pe…
jbrockmendel Apr 13, 2020
25f83d6
Merge branch 'master' of https://github.com/pandas-dev/pandas into pe…
jbrockmendel Apr 13, 2020
2142d29
Merge branch 'master' of https://github.com/pandas-dev/pandas into pe…
jbrockmendel Apr 14, 2020
0ca2125
Dummy commit to force CI
jbrockmendel Apr 14, 2020
1ea0cc0
Merge branch 'master' of https://github.com/pandas-dev/pandas into pe…
jbrockmendel Apr 15, 2020
2bfc308
update call bound
jbrockmendel Apr 15, 2020
d5ad2a0
update max_len
jbrockmendel Apr 15, 2020
92007e8
revert
jbrockmendel Apr 16, 2020
10f6349
Merge branch 'master' of https://github.com/pandas-dev/pandas into pe…
jbrockmendel Apr 16, 2020
3ce6f51
Merge branch 'master' of https://github.com/pandas-dev/pandas into pe…
jbrockmendel Apr 16, 2020
df471d8
PERF: always slice when indexing on columns
jbrockmendel Apr 16, 2020
b323ea6
typo fixup
jbrockmendel Apr 16, 2020
e557ace
Merge branch 'master' of https://github.com/pandas-dev/pandas into pe…
jbrockmendel Apr 21, 2020
96b7d6a
avoid take in one more case
jbrockmendel Apr 21, 2020
2ab3917
Merge branch 'master' into perf-never-take
jbrockmendel Jul 16, 2021
c63464f
Merge branch 'master' into perf-never-take
jbrockmendel Jul 18, 2021
6e34e3e
Merge branch 'master' into perf-never-take
jbrockmendel Jul 21, 2021
314a0f8
Merge branch 'master' into perf-never-take
jbrockmendel Jul 21, 2021
587ffef
Merge branch 'master' into perf-never-take
jbrockmendel Jul 24, 2021
7fad752
update
jbrockmendel Jul 24, 2021
7e8b0bd
Merge branch 'master' into perf-never-take
jbrockmendel Jul 31, 2021
6d82359
Merge branch 'master' into perf-never-take
jbrockmendel Aug 8, 2021
201e57d
fix last 2 failing tests
jbrockmendel Aug 14, 2021
69f08cb
mypy fixup
jbrockmendel Aug 15, 2021
f65cbf1
Merge branch 'master' into perf-never-take
jbrockmendel Aug 24, 2021
f337dfe
Merge branch 'master' into perf-never-take
jbrockmendel Aug 26, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions pandas/_libs/internals.pyi
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,7 @@ class BlockPlacement:
def delete(self, loc) -> BlockPlacement: ...
def append(self, others: list[BlockPlacement]) -> BlockPlacement: ...
def tile_for_unstack(self, factor: int) -> np.ndarray: ...
def to_slices(self) -> list[BlockPlacement]: ...

class SharedBlock:
_mgr_locs: BlockPlacement
Expand Down
60 changes: 59 additions & 1 deletion pandas/_libs/internals.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -264,6 +264,20 @@ cdef class BlockPlacement:
new_placement = np.concatenate(mapped)
return new_placement

def to_slices(self):
"""
Decompose a BlockPlacement into a list of BlockPlacements, each of
which is slice-like.

Returns
-------
List[BlockPlacement]
"""
if self.is_slice_like:
return [self]
slices = indexer_as_slices(self.indexer)
return [BlockPlacement(x) for x in slices]


cdef slice slice_canonize(slice s):
"""
Expand Down Expand Up @@ -303,7 +317,7 @@ cdef slice slice_canonize(slice s):
stop = start

if start < 0 or (stop < 0 and s.stop is not None and step > 0):
raise ValueError("unbounded slice")
raise ValueError("unbounded slice", start, stop, step, s)

if stop < 0:
return slice(start, None, step)
Expand Down Expand Up @@ -386,6 +400,50 @@ cdef slice_getitem(slice slc, ind):
return cnp.PyArray_Arange(s_start, s_stop, s_step, NPY_INTP)[ind]


@cython.boundscheck(False)
@cython.wraparound(False)
cdef list indexer_as_slices(int64_t[:] vals):
"""
Convert an indexer to a list of slices.
"""
# TODO: there may be more efficient ways to decompose an indexer into slices
cdef:
Py_ssize_t i, n, start, stop
int64_t d

if vals is None:
raise TypeError("vals must be ndarray")

n = vals.shape[0]

if n == 0:
return []

if n == 1:
return [slice(vals[0], vals[0] + 1, 1)]

# n >= 2
d = vals[1] - vals[0]

if d == 0:
# i guess we have duplicate values (TODO: how?)
return [slice(vals[0], vals[0] + 1, 1)] + indexer_as_slices(vals[1:])

for i in range(2, n):
if vals[i] - vals[i - 1] != d:
# we need to start a new slice
start = vals[0]
stop = vals[i - 1] + d
return [slice(start, stop, d)] + indexer_as_slices(vals[i:])

start = vals[0]
stop = start + n * d
if stop < 0 and d < 0:
return [slice(start, None, d)]
else:
return [slice(start, stop, d)]


@cython.boundscheck(False)
@cython.wraparound(False)
cdef slice indexer_as_slice(intp_t[:] vals):
Expand Down
3 changes: 3 additions & 0 deletions pandas/core/frame.py
Original file line number Diff line number Diff line change
Expand Up @@ -4589,6 +4589,9 @@ def _reindex_axes(self, axes, level, limit, tolerance, method, fill_value, copy)
columns, method, copy, level, fill_value, limit, tolerance
)

# If we have made a copy, no need to make another one
copy = False

index = axes["index"]
if index is not None:
frame = frame._reindex_index(
Expand Down
35 changes: 22 additions & 13 deletions pandas/core/internals/managers.py
Original file line number Diff line number Diff line change
Expand Up @@ -661,6 +661,8 @@ def reindex_indexer(
new_blocks = self._slice_take_blocks_ax0(
indexer, fill_value=fill_value, only_slice=only_slice
)
if copy:
new_blocks = [x.copy() for x in new_blocks]
else:
new_blocks = [
blk.take_nd(
Expand Down Expand Up @@ -696,6 +698,7 @@ def _slice_take_blocks_ax0(
only_slice : bool, default False
If True, we always return views on existing arrays, never copies.
This is used when called from ops.blockwise.operate_blockwise.
Ignored; TODO: remove argument.

Returns
-------
Expand All @@ -721,7 +724,7 @@ def _slice_take_blocks_ax0(
if allow_fill and fill_value is None:
fill_value = blk.fill_value

if not allow_fill and only_slice:
if not allow_fill:
# GH#33597 slice instead of take, so we get
# views instead of copies
blocks = [
Expand Down Expand Up @@ -758,8 +761,7 @@ def _slice_take_blocks_ax0(
# When filling blknos, make sure blknos is updated before appending to
# blocks list, that way new blkno is exactly len(blocks).
blocks = []
group = not only_slice
for blkno, mgr_locs in libinternals.get_blkno_placements(blknos, group=group):
for blkno, mgr_locs in libinternals.get_blkno_placements(blknos, group=False):
if blkno == -1:
# If we've got here, fill_value was not lib.no_default

Expand All @@ -785,24 +787,31 @@ def _slice_take_blocks_ax0(
# we may try to only slice
taker = blklocs[mgr_locs.indexer]
max_len = max(len(mgr_locs), taker.max() + 1)
if only_slice:
taker = lib.maybe_indices_to_slice(taker, max_len)
taker = lib.maybe_indices_to_slice(taker, max_len)

if isinstance(taker, slice):
bp = libinternals.BlockPlacement(taker)
bps = bp.to_slices()
assert len(bps) == 1 # just checking
nb = blk.getitem_block_columns(taker, new_mgr_locs=mgr_locs)
blocks.append(nb)
elif only_slice:
else:
# GH#33597 slice instead of take, so we get
# views instead of copies
for i, ml in zip(taker, mgr_locs):
slc = slice(i, i + 1)
bp = BlockPlacement(ml)
nb = blk.getitem_block_columns(slc, new_mgr_locs=bp)
bp = libinternals.BlockPlacement(taker)
bps = bp.to_slices()
left, right = 0, 0

for sub in bps:
right += len(sub)
new_locs = mgr_locs[left:right]
# we have isinstance(sub.indexer, slice)
nb = blk.getitem_block_columns(
sub.indexer, new_mgr_locs=new_locs
)
# We have np.shares_memory(nb.values, blk.values)
blocks.append(nb)
else:
nb = blk.take_nd(taker, axis=0, new_mgr_locs=mgr_locs)
blocks.append(nb)
left += len(sub)

return blocks

Expand Down
5 changes: 2 additions & 3 deletions pandas/tests/frame/indexing/test_xs.py
Original file line number Diff line number Diff line change
Expand Up @@ -374,12 +374,11 @@ def test_xs_droplevel_false_view(self, using_array_manager):
expected = DataFrame({"a": [1]})
tm.assert_frame_equal(result, expected)

# with mixed dataframe, modifying the parent doesn't modify result
# TODO the "split" path behaves differently here as with single block
# with mixed dataframe, we still get a view on the parent
df = DataFrame([[1, 2.5, "a"]], columns=Index(["a", "b", "c"]))
result = df.xs("a", axis=1, drop_level=False)
df.iloc[0, 0] = 2
expected = DataFrame({"a": [1]})
expected = DataFrame({"a": [2]})
tm.assert_frame_equal(result, expected)

def test_xs_list_indexer_droplevel_false(self):
Expand Down
3 changes: 2 additions & 1 deletion pandas/tests/internals/test_internals.py
Original file line number Diff line number Diff line change
Expand Up @@ -717,7 +717,8 @@ def test_reindex_items(self):
mgr = create_mgr("a: f8; b: i8; c: f8; d: i8; e: f8; f: bool; g: f8-2")

reindexed = mgr.reindex_axis(["g", "c", "a", "d"], axis=0)
assert reindexed.nblocks == 2
assert reindexed.nblocks == 3
assert all(b.mgr_locs.is_slice_like for b in reindexed.blocks)
tm.assert_index_equal(reindexed.items, Index(["g", "c", "a", "d"]))
tm.assert_almost_equal(
mgr.iget(6).internal_values(), reindexed.iget(0).internal_values()
Expand Down