Skip to content

PERF: Improve efficiency of BlockValuesRefs #59598

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Aug 28, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 7 additions & 7 deletions pandas/_libs/internals.pyx
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
from collections import defaultdict
import weakref

cimport cython
from cpython.pyport cimport PY_SSIZE_T_MAX
from cpython.slice cimport PySlice_GetIndicesEx
from cpython.weakref cimport PyWeakref_NewRef
from cython cimport Py_ssize_t

import numpy as np
Expand Down Expand Up @@ -746,7 +746,7 @@ cdef class BlockManager:
# -------------------------------------------------------------------
# Block Placement

def _rebuild_blknos_and_blklocs(self) -> None:
cpdef _rebuild_blknos_and_blklocs(self):
"""
Update mgr._blknos / mgr._blklocs.
"""
Expand Down Expand Up @@ -890,12 +890,12 @@ cdef class BlockValuesRefs:

def __cinit__(self, blk: Block | None = None) -> None:
if blk is not None:
self.referenced_blocks = [weakref.ref(blk)]
self.referenced_blocks = [PyWeakref_NewRef(blk, None)]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this actually change anything? I am under the impression that Cython would be smart enough to handle this the same in both cases.

Instead of looking at the benchmarks (which can be difficult to run and flaky), you can also inspect the Cython annotations before/after the change. Are they actually different?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Glad you brought that up. I should've been more explicit in the description, as the motivation for this PR is that the code-gen changes quite dramatically. An example from one of the methods below. Note the code generated for the method does not fit on one screen before the change

Before

image

After

image

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for posting. That's...quite the difference

else:
self.referenced_blocks = []
self.clear_counter = 500 # set reasonably high

def _clear_dead_references(self, force=False) -> None:
cdef _clear_dead_references(self, bint force=False):
# Use exponential backoff to decide when we want to clear references
# if force=False. Clearing for every insertion causes slowdowns if
# all these objects stay alive, e.g. df.items() for wide DataFrames
Expand All @@ -910,7 +910,7 @@ cdef class BlockValuesRefs:
elif nr_of_refs > self.clear_counter:
self.clear_counter = max(self.clear_counter * 2, nr_of_refs)

def add_reference(self, blk: Block) -> None:
cpdef add_reference(self, Block blk):
"""Adds a new reference to our reference collection.

Parameters
Expand All @@ -919,7 +919,7 @@ cdef class BlockValuesRefs:
The block that the new references should point to.
"""
self._clear_dead_references()
self.referenced_blocks.append(weakref.ref(blk))
self.referenced_blocks.append(PyWeakref_NewRef(blk, None))

def add_index_reference(self, index: object) -> None:
"""Adds a new reference to our reference collection when creating an index.
Expand All @@ -930,7 +930,7 @@ cdef class BlockValuesRefs:
The index that the new reference should point to.
"""
self._clear_dead_references()
self.referenced_blocks.append(weakref.ref(index))
self.referenced_blocks.append(PyWeakref_NewRef(index, None))

def has_reference(self) -> bool:
"""Checks if block has foreign references.
Expand Down
Loading