Skip to content

REGR: Assigning label with registered EA dtype raises #38427

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Dec 14, 2020
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion doc/source/whatsnew/v1.2.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -856,7 +856,7 @@ Other
- Bug in :meth:`Index.drop` raising ``InvalidIndexError`` when index has duplicates (:issue:`38051`)
- Bug in :meth:`RangeIndex.difference` returning :class:`Int64Index` in some cases where it should return :class:`RangeIndex` (:issue:`38028`)
- Fixed bug in :func:`assert_series_equal` when comparing a datetime-like array with an equivalent non extension dtype array (:issue:`37609`)

- Bug in :func:`.is_bool_dtype` would raise when passed a valid string such as ``"boolean"`` (:issue:`38386`).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo extra period before is_bool_dtype

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The period before is_bool_dtype allows sphinx link to the proper page; without it, a link will not be made. However, the period after the issue number is a typo.



.. ---------------------------------------------------------------------------
Expand Down
2 changes: 1 addition & 1 deletion pandas/core/dtypes/common.py
Original file line number Diff line number Diff line change
Expand Up @@ -1397,7 +1397,7 @@ def is_bool_dtype(arr_or_dtype) -> bool:
# guess this
return arr_or_dtype.is_object and arr_or_dtype.inferred_type == "boolean"
elif is_extension_array_dtype(arr_or_dtype):
return getattr(arr_or_dtype, "dtype", arr_or_dtype)._is_boolean
return getattr(dtype, "_is_boolean", False)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment that this is de facto an isinstance(dtype, BooleanDtype)?

Copy link
Member Author

@rhshadrach rhshadrach Dec 13, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this might also return True for certain sparse and categorical.


return issubclass(dtype.type, np.bool_)

Expand Down
11 changes: 3 additions & 8 deletions pandas/io/parsers.py
Original file line number Diff line number Diff line change
Expand Up @@ -1689,9 +1689,8 @@ def _convert_to_ndarrays(
values, set(col_na_values) | col_na_fvalues, try_num_bool=False
)
else:
is_str_or_ea_dtype = is_string_dtype(
cast_type
) or is_extension_array_dtype(cast_type)
is_ea = is_extension_array_dtype(cast_type)
is_str_or_ea_dtype = is_string_dtype(cast_type) or is_ea
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

put is_ea first to short-circuit the other check

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

# skip inference if specified dtype is object
# or casting to an EA
try_num_bool = not (cast_type and is_str_or_ea_dtype)
Expand All @@ -1707,11 +1706,7 @@ def _convert_to_ndarrays(
or is_extension_array_dtype(cast_type)
):
try:
if (
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can the if go outside the try instead of the other way around?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

is_bool_dtype(cast_type)
and not is_categorical_dtype(cast_type)
and na_count > 0
):
if not is_ea and na_count > 0 and is_bool_dtype(cast_type):
raise ValueError(f"Bool column has NA values in column {c}")
except (AttributeError, TypeError):
# invalid input to is_bool_dtype
Expand Down
2 changes: 2 additions & 0 deletions pandas/tests/dtypes/test_common.py
Original file line number Diff line number Diff line change
Expand Up @@ -545,6 +545,7 @@ def test_is_bool_dtype():
assert not com.is_bool_dtype(pd.Series([1, 2]))
assert not com.is_bool_dtype(np.array(["a", "b"]))
assert not com.is_bool_dtype(pd.Index(["a", "b"]))
assert not com.is_bool_dtype("Int64")

assert com.is_bool_dtype(bool)
assert com.is_bool_dtype(np.bool_)
Expand All @@ -553,6 +554,7 @@ def test_is_bool_dtype():

assert com.is_bool_dtype(pd.BooleanDtype())
assert com.is_bool_dtype(pd.array([True, False, None], dtype="boolean"))
assert com.is_bool_dtype("boolean")


@pytest.mark.filterwarnings("ignore:'is_extension_type' is deprecated:FutureWarning")
Expand Down
22 changes: 22 additions & 0 deletions pandas/tests/frame/indexing/test_setitem.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
import numpy as np
import pytest

from pandas.core.dtypes.base import registry as ea_registry
from pandas.core.dtypes.dtypes import DatetimeTZDtype, IntervalDtype, PeriodDtype

from pandas import (
Expand Down Expand Up @@ -197,6 +198,27 @@ def test_setitem_extension_types(self, obj, dtype):

tm.assert_frame_equal(df, expected)

@pytest.mark.parametrize(
"ea_name",
# mypy doesn't allow adding lists of different types
# https://github.com/python/mypy/issues/5492
[
*[
dtype.name
for dtype in ea_registry.dtypes
# property would require instantiation
if not isinstance(dtype.name, property)
],
*["datetime64[ns, UTC]", "period[D]"],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, but I find this ridiculous. And not you, but mypy ;) (since when are lists supposed to be of homogeneous content?)
If mypy can't deal with it, I would just add a ignore comment, instead of the unpacking into a list

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

],
)
def test_setitem_with_ea_name(self, ea_name):
# GH 38386
result = DataFrame([0])
result[ea_name] = [1]
expected = DataFrame({0: [0], ea_name: [1]})
tm.assert_frame_equal(result, expected)

def test_setitem_dt64_ndarray_with_NaT_and_diff_time_units(self):
# GH#7492
data_ns = np.array([1, "nat"], dtype="datetime64[ns]")
Expand Down