Skip to content

PERF: MultiIndex.memory_usage shouldn't trigger the index engine #58385

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions doc/source/whatsnew/v3.0.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -332,6 +332,7 @@ Performance improvements
- Performance improvement in :meth:`Index.take` when ``indices`` is a full range indexer from zero to length of index (:issue:`56806`)
- Performance improvement in :meth:`Index.to_frame` returning a :class:`RangeIndex` columns of a :class:`Index` when possible. (:issue:`58018`)
- Performance improvement in :meth:`MultiIndex.equals` for equal length indexes (:issue:`56990`)
- Performance improvement in :meth:`MultiIndex.memory_usage` to ignore the index engine when it isn't already cached. (:issue:`58385`)
- Performance improvement in :meth:`RangeIndex.__getitem__` with a boolean mask or integers returning a :class:`RangeIndex` instead of a :class:`Index` when possible. (:issue:`57588`)
- Performance improvement in :meth:`RangeIndex.append` when appending the same index (:issue:`57252`)
- Performance improvement in :meth:`RangeIndex.argmin` and :meth:`RangeIndex.argmax` (:issue:`57823`)
Expand Down
5 changes: 3 additions & 2 deletions pandas/core/indexes/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -4863,8 +4863,9 @@ def _from_join_target(self, result: np.ndarray) -> ArrayLike:
def memory_usage(self, deep: bool = False) -> int:
result = self._memory_usage(deep=deep)

# include our engine hashtable
result += self._engine.sizeof(deep=deep)
# include our engine hashtable, only if it's already cached
if "_engine" in self._cache:
result += self._engine.sizeof(deep=deep)
return result

@final
Expand Down
5 changes: 3 additions & 2 deletions pandas/core/indexes/multi.py
Original file line number Diff line number Diff line change
Expand Up @@ -1391,8 +1391,9 @@ def _nbytes(self, deep: bool = False) -> int:
names_nbytes = sum(getsizeof(i, objsize) for i in self.names)
result = level_nbytes + label_nbytes + names_nbytes

# include our engine hashtable
result += self._engine.sizeof(deep=deep)
# include our engine hashtable, only if it's already cached
if "_engine" in self._cache:
result += self._engine.sizeof(deep=deep)
return result

# --------------------------------------------------------------------
Expand Down
24 changes: 24 additions & 0 deletions pandas/tests/indexes/test_old_base.py
Original file line number Diff line number Diff line change
Expand Up @@ -326,6 +326,30 @@ def test_memory_usage(self, index):
if index.inferred_type == "object":
assert result3 > result2

def test_memory_usage_doesnt_trigger_engine(self, index):
index._cache.clear()
assert "_engine" not in index._cache

res_without_engine = index.memory_usage()
assert "_engine" not in index._cache

# explicitly load and cache the engine
_ = index._engine
assert "_engine" in index._cache

res_with_engine = index.memory_usage()

# the empty engine doesn't affect the result even when initialized with values,
# because engine.sizeof() doesn't consider the content of engine.values
assert res_with_engine == res_without_engine

if len(index) == 0:
assert res_without_engine == 0
assert res_with_engine == 0
else:
assert res_without_engine > 0
assert res_with_engine > 0

def test_argsort(self, index):
if isinstance(index, CategoricalIndex):
pytest.skip(f"{type(self).__name__} separately tested")
Expand Down
Loading