-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: True cannot be cast to bool in read_excel #58994
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: True cannot be cast to bool in read_excel #58994
Conversation
@rhshadrach pinging on green |
@@ -164,6 +164,17 @@ def xfail_datetimes_with_pyxlsb(engine, request): | |||
|
|||
|
|||
class TestReaders: | |||
def test_read_excel_type_check(self): | |||
# GH 58159 | |||
df = DataFrame({"bool_column": [True]}, dtype=pd.BooleanDtype.name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since boolean
is a nullable EA, can you also add None
as a value in this column? the assert can also probably change to tm.assert_frame_equal(df, df2)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Test failure is unrelated - #59018 (comment)
just one comment - may not even be relevant.
values_str, | ||
dtype=cast_type, | ||
true_values=self.true_values, | ||
false_values=self.false_values, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
none_values
is not passed anything here. Can the equivalent na_values
and keep_default_na
be passed in here, or is it irrelevant?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I see what you're saying. You're suggesting to propagate na_values
and keep_default_na
from read_excel() to _cast_types()?
I think that's a good idea. I'll make that change. That should allow me to get rid of _NONE_VALUES
in boolean.py, and allow me to dynamically handle any values that equate to None.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Those params are already available in _cast_types since it's an instance method (so self.na_values and self.keep_default_na are available here, similar to self.true_values and self.false_values). I was thinking about progagating them to _from_sequence_of_strings()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep we're on the same page. Making those changes now
@asishm @rhshadrach PR is ready for review |
doc/source/whatsnew/v3.0.0.rst
Outdated
@@ -547,6 +547,7 @@ I/O | |||
- Bug in :meth:`DataFrame.to_stata` when writing :class:`DataFrame` and ``byteorder=`big```. (:issue:`58969`) | |||
- Bug in :meth:`DataFrame.to_string` that raised ``StopIteration`` with nested DataFrames. (:issue:`16098`) | |||
- Bug in :meth:`read_csv` raising ``TypeError`` when ``index_col`` is specified and ``na_values`` is a dict containing the key ``None``. (:issue:`57547`) | |||
- Bug in :meth:`read_excel` raising ``ValueError`` when passing array of boolean values when ``dtype=pd.BooleanDtype.name``. (:issue:`58159`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Bug in :meth:`read_excel` raising ``ValueError`` when passing array of boolean values when ``dtype=pd.BooleanDtype.name``. (:issue:`58159`) | |
- Bug in :meth:`read_excel` raising ``ValueError`` when passing array of boolean values when ``dtype="boolean"``. (:issue:`58159`) |
pandas/core/arrays/boolean.py
Outdated
@@ -297,7 +297,13 @@ class BooleanArray(BaseMaskedArray): | |||
""" | |||
|
|||
_TRUE_VALUES = {"True", "TRUE", "true", "1", "1.0"} | |||
_FALSE_VALUES = {"False", "FALSE", "false", "0", "0.0"} | |||
_FALSE_VALUES = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you undo the formatting here?
def test_read_excel_type_check(self, col): | ||
# GH 58159 | ||
df = DataFrame({"bool_column": col}, dtype=pd.BooleanDtype.name) | ||
df.to_excel("test-type.xlsx", index=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you use the tmp_excel
fixture for this path?
|
||
def test_pass_none_type(self): | ||
# GH 58159 | ||
with pd.ExcelFile("test7.xlsx") as excel: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you use the datapath
fixture to access test7.xlsx
. Also could you give a test7.xlsx
a better name reflective of the contents?
@pytest.mark.parametrize("col", [[True, None, False], [True], [True, False]]) | ||
def test_read_excel_type_check(self, col): | ||
# GH 58159 | ||
df = DataFrame({"bool_column": col}, dtype=pd.BooleanDtype.name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
df = DataFrame({"bool_column": col}, dtype=pd.BooleanDtype.name) | |
df = DataFrame({"bool_column": col}, dtype="boolean") |
And could you change other usages here?
… into dev/bug/cast-to-boolean
@mroeschke PR ready for review |
f_path = os.path.join( | ||
datapath("io", "data", "excel"), "test_boolean_types.xlsx" | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
f_path = os.path.join( | |
datapath("io", "data", "excel"), "test_boolean_types.xlsx" | |
) | |
f_path = datapath("io", "data", "excel", "test_boolean_types.xlsx") |
|
||
def test_pass_none_type(self, datapath): | ||
# GH 58159 | ||
f_path = os.path.join(datapath("io", "data", "excel"), "test_none_type.xlsx") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
f_path = os.path.join(datapath("io", "data", "excel"), "test_none_type.xlsx") | |
f_path = datapath("io", "data", "excel", "test_none_type.xlsx") |
Thanks @rmhowe425 |
doc/source/whatsnew/vX.X.X.rst
file if fixing a bug or adding a new feature.