Skip to content

PERF: Use fastpath for accessing option value in internals #49771

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 5 additions & 2 deletions pandas/core/internals/managers.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@

import numpy as np

from pandas._config import get_option
from pandas._config.config import _global_config
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this much faster than using pandas._config.config.options?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's in any case faster, since _global_config is just a plain python dict (which is quite fast), while the options object is a custom class mimicking attribute access:

In [32]: %timeit _global_config["mode"]["copy_on_write"]
52.6 ns ± 1.85 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)

In [36]: %timeit pd._config.config.options.mode.copy_on_write
2.78 µs ± 22 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

Strangely it seems that this is even slower than the pd.get_option("mode.copy_on_write"):

In [37]: %timeit pd.get_option("mode.copy_on_write")
1.38 µs ± 14.6 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

Copy link
Member Author

@jorisvandenbossche jorisvandenbossche Nov 18, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we really want to avoid a private import, this is also an option:

In [39]: %timeit pd.options.d["mode"]["copy_on_write"]
85.9 ns ± 2.39 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)

The options DictWrapper class seems to have a "public" d attribute exposing the underlying dict (_global_config)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not worth tying ourself in knots over, thanks for checking


from pandas._libs import (
algos as libalgos,
Expand Down Expand Up @@ -2377,5 +2377,8 @@ def _preprocess_slice_or_indexer(
return "fancy", indexer, len(indexer)


_mode_options = _global_config["mode"]


def _using_copy_on_write():
return get_option("mode.copy_on_write")
return _mode_options["copy_on_write"]
30 changes: 30 additions & 0 deletions pandas/tests/copy_view/test_internals.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@

import pandas.util._test_decorators as td

import pandas as pd
from pandas import DataFrame
from pandas.tests.copy_view.util import get_array

Expand Down Expand Up @@ -62,3 +63,32 @@ def test_clear_parent(using_copy_on_write):
# when losing the last reference, also the parent should be reset
subset["b"] = 0
assert subset._mgr.parent is None


@td.skip_array_manager_invalid_test
def test_switch_options():
# ensure we can switch the value of the option within one session
# (assuming data is constructed after switching)

# using the option_context to ensure we set back to global option value
# after running the test
with pd.option_context("mode.copy_on_write", False):
df = DataFrame({"a": [1, 2, 3], "b": [0.1, 0.2, 0.3]})
subset = df[:]
subset.iloc[0, 0] = 0
# df updated with CoW disabled
assert df.iloc[0, 0] == 0

pd.options.mode.copy_on_write = True
df = DataFrame({"a": [1, 2, 3], "b": [0.1, 0.2, 0.3]})
subset = df[:]
subset.iloc[0, 0] = 0
# df not updated with CoW enabled
assert df.iloc[0, 0] == 1

pd.options.mode.copy_on_write = False
df = DataFrame({"a": [1, 2, 3], "b": [0.1, 0.2, 0.3]})
subset = df[:]
subset.iloc[0, 0] = 0
# df updated with CoW disabled
assert df.iloc[0, 0] == 0