Skip to content

Change _can_hold_na to a class attribute and document that it shouldn't be changed #20819

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Apr 26, 2018
26 changes: 11 additions & 15 deletions pandas/core/arrays/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -38,10 +38,9 @@ class ExtensionArray(object):
* copy
* _concat_same_type

Some additional methods are available to satisfy pandas' internal, private
block API:
An additional method is available to satisfy pandas' internal,
private block API.

* _can_hold_na
* _formatting_values

Some methods require casting the ExtensionArray to an ndarray of Python
Expand Down Expand Up @@ -399,7 +398,8 @@ def _values_for_factorize(self):
Returns
-------
values : ndarray
An array suitable for factoraization. This should maintain order

An array suitable for factorization. This should maintain order
and be a supported dtype (Float64, Int64, UInt64, String, Object).
By default, the extension array is cast to object dtype.
na_value : object
Expand All @@ -422,7 +422,7 @@ def factorize(self, na_sentinel=-1):
Returns
-------
labels : ndarray
An interger NumPy array that's an indexer into the original
An integer NumPy array that's an indexer into the original
ExtensionArray.
uniques : ExtensionArray
An ExtensionArray containing the unique values of `self`.
Expand Down Expand Up @@ -566,16 +566,12 @@ def _concat_same_type(cls, to_concat):
"""
raise AbstractMethodError(cls)

@property
def _can_hold_na(self):
# type: () -> bool
"""Whether your array can hold missing values. True by default.

Notes
-----
Setting this to false will optimize some operations like fillna.
"""
return True
# The _can_hold_na attribute is set to True so that pandas internals
# will use the ExtensionDtype.na_value as the NA value in operations
# such as take(), reindex(), shift(), etc. In addition, those results
# will then be of the ExtensionArray subclass rather than an array
# of objects
_can_hold_na = True

@property
def _ndarray_values(self):
Expand Down
2 changes: 1 addition & 1 deletion pandas/tests/extension/base/interface.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ def test_ndim(self, data):
assert data.ndim == 1

def test_can_hold_na_valid(self, data):
assert data._can_hold_na in {True, False}
assert data._can_hold_na # Must be True
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can assert data._can_hold_na is True, otherwise this is not strictly checking that it is an actual boolean True


def test_memory_usage(self, data):
s = pd.Series(data)
Expand Down
5 changes: 1 addition & 4 deletions pandas/tests/extension/base/missing.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,10 +9,7 @@

class BaseMissingTests(BaseExtensionTests):
def test_isna(self, data_missing):
if data_missing._can_hold_na:
expected = np.array([True, False])
else:
expected = np.array([False, False])
expected = np.array([True, False])

result = pd.isna(data_missing)
tm.assert_numpy_array_equal(result, expected)
Expand Down
2 changes: 1 addition & 1 deletion pandas/tests/extension/conftest.py
Original file line number Diff line number Diff line change
Expand Up @@ -57,7 +57,7 @@ def na_cmp():
Should return a function of two arguments that returns
True if both arguments are (scalar) NA for your type.

By default, uses ``operator.or``
By default, uses ``operator.is_``
"""
return operator.is_

Expand Down