-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG/PERF: Use EA in Index.get_value #24204
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Avoid converting the ExtensionArray to an ndarray. Before ``` pandas/tests/arrays/categorical/test_indexing.py F [100%] ========================================================================= FAILURES ========================================================================= ______________________________________________________________________ test_series_at ______________________________________________________________________ def test_series_at(): arr = NonCoercaibleCategorical(['a', 'b', 'c']) ser = Series(arr) > result = ser.at[0] pandas/tests/arrays/categorical/test_indexing.py:158: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ pandas/core/indexing.py:2266: in __getitem__ return self.obj._get_value(*key, takeable=self._takeable) pandas/core/series.py:1078: in _get_value return self.index.get_value(self._values, label) pandas/core/indexes/base.py:4303: in get_value s = com.values_from_object(series) pandas/_libs/lib.pyx:82: in pandas._libs.lib.values_from_object obj = func() pandas/core/arrays/categorical.py:1509: in get_values return np.array(self) _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ self = <[ValueError("I cannot be converted.") raised in repr()] NonCoercaibleCategorical object at 0x113d8b6a0>, dtype = None def __array__(self, dtype=None): > raise ValueError("I cannot be converted.") E ValueError: I cannot be converted. ``` Perf: ``` In [3]: a = pd.Series(pd.Categorical(np.random.choice(list(string.ascii_letters[:10]), 10_000))) In [4]: %timeit a.at[0] 143 µs ± 4.86 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each) In [3]: a = pd.Series(pd.Categorical(np.random.choice(list(string.ascii_letters[:10]), 10_000))) In [4]: %timeit a.at[0] 11.1 µs ± 95.2 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) ```
Hello @TomAugspurger! Thanks for submitting the PR.
|
@@ -145,3 +145,15 @@ def test_mask_with_boolean_raises(index): | |||
|
|||
with pytest.raises(ValueError, match='NA / NaN'): | |||
s[idx] | |||
|
|||
|
|||
class NonCoercaibleCategorical(Categorical): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I imagine this pattern would be generally useful to detect places where we accidentally convert an EA to an ndarray. What would be the best way to share this among tests? Would making a fixture that returned this be too weird? Monkeypatching?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could do this with a pytest monkeypatch (maybe just create an issue to follow up)
Codecov Report
@@ Coverage Diff @@
## master #24204 +/- ##
=======================================
Coverage 92.21% 92.21%
=======================================
Files 162 162
Lines 51723 51723
=======================================
Hits 47694 47694
Misses 4029 4029
Continue to review full report at Codecov.
|
Codecov Report
@@ Coverage Diff @@
## master #24204 +/- ##
==========================================
+ Coverage 92.21% 92.21% +<.01%
==========================================
Files 162 162
Lines 51723 51761 +38
==========================================
+ Hits 47694 47731 +37
- Misses 4029 4030 +1
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you'd just document what you are doing (and fix spelling), looks good to me.
@@ -145,3 +145,15 @@ def test_mask_with_boolean_raises(index): | |||
|
|||
with pytest.raises(ValueError, match='NA / NaN'): | |||
s[idx] | |||
|
|||
|
|||
class NonCoercaibleCategorical(Categorical): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could do this with a pytest monkeypatch (maybe just create an issue to follow up)
All green. |
Opened #24207 for followup. |
thanks! |
Avoid converting the ExtensionArray to an ndarray.
Before
Perf:
Split from #24024