Skip to content

BUG: is_dtype_equal(dtype, "string[pyarrow]") raises if pyarrow not installed #44327

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 20 commits into from
Dec 15, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion doc/source/whatsnew/v1.4.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -655,7 +655,7 @@ Conversion

Strings
^^^^^^^
-
- Fixed bug in checking for ``string[pyarrow]`` dtype incorrectly raising an ImportError when pyarrow is not installed (:issue:`44327`)
-

Interval
Expand Down
4 changes: 2 additions & 2 deletions pandas/core/dtypes/common.py
Original file line number Diff line number Diff line change
Expand Up @@ -613,7 +613,7 @@ def is_dtype_equal(source, target) -> bool:
src = get_dtype(source)
if isinstance(src, ExtensionDtype):
return src == target
except (TypeError, AttributeError):
except (TypeError, AttributeError, ImportError):
return False
elif isinstance(source, str):
return is_dtype_equal(target, source)
Expand All @@ -622,7 +622,7 @@ def is_dtype_equal(source, target) -> bool:
source = get_dtype(source)
target = get_dtype(target)
return source == target
except (TypeError, AttributeError):
except (TypeError, AttributeError, ImportError):

# invalid comparison
# object == category will hit this
Expand Down
7 changes: 7 additions & 0 deletions pandas/tests/dtypes/test_common.py
Original file line number Diff line number Diff line change
Expand Up @@ -115,6 +115,7 @@ def test_period_dtype(self, dtype):
"float": np.dtype(np.float64),
"object": np.dtype(object),
"category": com.pandas_dtype("category"),
"string": pd.StringDtype(),
}


Expand All @@ -128,6 +129,12 @@ def test_dtype_equal(name1, dtype1, name2, dtype2):
assert not com.is_dtype_equal(dtype1, dtype2)


@pytest.mark.parametrize("name,dtype", list(dtypes.items()), ids=lambda x: str(x))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To the above dtypes dict, can you add an entry for "string": pd.StringDtype() ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be a mistake because isinstance(pd.StringDtype(), str) is False. Adding such an entry in taking to the wrong location in the code segment.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isinstance(pd.StringDtype(), str)

Where is this being checked?

My question is to ensure that is_dtype_equal(pd.StringDtype(), "string[pyarrow]") correctly works, since that was the original bug report.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if isinstance(target, str):
    if not isinstance(source, str):
        # GH#38516 ensure we get the same behavior from
        #  is_dtype_equal(CDT, "category") and CDT == "category"
        # GH-44276 ensures behaviour is consistent
        # is_dtype_equal (dtype, "string[pyarrow]")
        try:
            src = get_dtype(source)
            if isinstance(src, ExtensionDtype):
            if isinstance(src, (ExtensionDtype, np.dtype)):

In this code if not isinstance(source, str): returns True when source is pd.StringDtype()

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line returns True for all cases in the current test. Also the other dtype values in the parametrization of this test are actual dtype objects, not strings.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, but on adding this line into the parameters, this if not isinstance(source, str): returns True which is wrong. Shouldn't isinstance(source, str) be True if source is pd.StringDtype()

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, isinstance(source, str) checks whether source is an actual python string object, while pd.StringDtype is a dtype object, which is a different class, and thus this is expected to return False (the fact that StringDtype is a dtype object for an array that stores strings doesn't matter in that case)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added the testcase and fixed the Import Bug as well

def test_pyarrow_string_import_error(name, dtype):
# GH-44276
assert not com.is_dtype_equal(dtype, "string[pyarrow]")


@pytest.mark.parametrize(
"dtype1,dtype2",
[
Expand Down