Skip to content

BUG: True cannot be cast to bool in read_excel #58994

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 17 commits into from
Jun 25, 2024
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions doc/source/whatsnew/v3.0.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -547,6 +547,7 @@ I/O
- Bug in :meth:`DataFrame.to_stata` when writing :class:`DataFrame` and ``byteorder=`big```. (:issue:`58969`)
- Bug in :meth:`DataFrame.to_string` that raised ``StopIteration`` with nested DataFrames. (:issue:`16098`)
- Bug in :meth:`read_csv` raising ``TypeError`` when ``index_col`` is specified and ``na_values`` is a dict containing the key ``None``. (:issue:`57547`)
- Bug in :meth:`read_excel` raising ``ValueError`` when passing array of boolean values when ``dtype=pd.BooleanDtype.name``. (:issue:`58159`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Bug in :meth:`read_excel` raising ``ValueError`` when passing array of boolean values when ``dtype=pd.BooleanDtype.name``. (:issue:`58159`)
- Bug in :meth:`read_excel` raising ``ValueError`` when passing array of boolean values when ``dtype="boolean"``. (:issue:`58159`)

- Bug in :meth:`read_stata` raising ``KeyError`` when input file is stored in big-endian format and contains strL data. (:issue:`58638`)

Period
Expand Down
8 changes: 7 additions & 1 deletion pandas/io/parsers/base_parser.py
Original file line number Diff line number Diff line change
Expand Up @@ -742,7 +742,9 @@ def _cast_types(self, values: ArrayLike, cast_type: DtypeObj, column) -> ArrayLi
elif isinstance(cast_type, ExtensionDtype):
array_type = cast_type.construct_array_type()
try:
if isinstance(cast_type, BooleanDtype):
if isinstance(cast_type, BooleanDtype) and all(
isinstance(value, str) for value in values
):
# error: Unexpected keyword argument "true_values" for
# "_from_sequence_of_strings" of "ExtensionArray"
return array_type._from_sequence_of_strings( # type: ignore[call-arg]
Expand All @@ -751,6 +753,10 @@ def _cast_types(self, values: ArrayLike, cast_type: DtypeObj, column) -> ArrayLi
true_values=self.true_values,
false_values=self.false_values,
)
elif isinstance(cast_type, BooleanDtype) and all(
isinstance(value, bool) for value in values
):
return values
else:
return array_type._from_sequence_of_strings(values, dtype=cast_type)
except NotImplementedError as err:
Expand Down
11 changes: 11 additions & 0 deletions pandas/tests/io/excel/test_readers.py
Original file line number Diff line number Diff line change
Expand Up @@ -164,6 +164,17 @@ def xfail_datetimes_with_pyxlsb(engine, request):


class TestReaders:
def test_read_excel_type_check(self):
# GH 58159
df = DataFrame({"bool_column": [True]}, dtype=pd.BooleanDtype.name)
Copy link
Contributor

@asishm asishm Jun 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since boolean is a nullable EA, can you also add None as a value in this column? the assert can also probably change to tm.assert_frame_equal(df, df2)

df.to_excel("test-type.xlsx", index=False)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you use the tmp_excel fixture for this path?

df2 = pd.read_excel(
"test-type.xlsx",
dtype={"bool_column": pd.BooleanDtype.name},
engine="openpyxl",
)
assert all(isinstance(val, bool) for val in df2["bool_column"])

@pytest.fixture(autouse=True)
def cd_and_set_engine(self, engine, datapath, monkeypatch):
"""
Expand Down