Skip to content

BUG: is_dtype_equal(dtype, "string[pyarrow]") raises if pyarrow not installed #44327

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 20 commits into from
Dec 15, 2021
Merged
Show file tree
Hide file tree
Changes from 8 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion doc/source/whatsnew/v1.4.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -509,7 +509,7 @@ Conversion

Strings
^^^^^^^
-
- Fixed bug in :bug:`is_dtype_equal` raising if pyarrow not installed (:issue:`44327`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this won't render & is not public facing anyhow. remove the note

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not understand the part where "is not public facing anyhow.". I will anyhow remove it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do I remove this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can keep it, but reword it a bit to not mention is_dtype_equal explicitly. Something like "Fixed a bug in checking if a dtype is the string[pyarrow] dtype incorrectly raising an error when pyarrow is not installed (:issue:...)"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

-

Interval
Expand Down
5 changes: 4 additions & 1 deletion pandas/core/dtypes/common.py
Original file line number Diff line number Diff line change
Expand Up @@ -599,9 +599,12 @@ def is_dtype_equal(source, target) -> bool:
if not isinstance(source, str):
# GH#38516 ensure we get the same behavior from
# is_dtype_equal(CDT, "category") and CDT == "category"
# GH-44276 ensures behaviour is consistent
# is_dtype_equal (dtype, "string[pyarrow]")
try:
src = get_dtype(source)
if isinstance(src, ExtensionDtype):
if isinstance(src, (ExtensionDtype, np.dtype)):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this change still needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is needed because dtypes such as int64 float64 and other dtypes actually are np.dtype, so if I am not adding this check, the test is failing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is needed because dtypes such as int64 float64 and other dtypes actually are np.dtype, so if I am not adding this check, the test is failing.

It is not needed if we catch ImportError more generally, I think; I pushed some small commits to verify this.


return src == target
except (TypeError, AttributeError):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can probably catch the ImportError here (so it always returns False if the above raises an ImportError)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

return False
Expand Down
5 changes: 5 additions & 0 deletions pandas/tests/dtypes/test_dtypes.py
Original file line number Diff line number Diff line change
Expand Up @@ -115,6 +115,11 @@ def test_construction_from_string(self, dtype):
with pytest.raises(TypeError, match=msg):
CategoricalDtype.construct_from_string("foo")

def test_pyarrow_string_import_error(self, dtype):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another note: this test is added to TestCategoricalDtype, so categorical specific tests. I think we should move it to a more generic location. For example in tests/dtypes/test_common.py there is a test_dtype_equal that tests the basic functionality of is_dtype_equal. Maybe add the new test there (or in the arrow string dtype specific tests, @jbrockmendel any preference?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Request you to inform me where do I exactly put the test, in which file and which function

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can put it in tests/dtypes/test_common.py, just after the existing test_dtype_equal test.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

# GH-44276
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you flesh this out a tiny bit e.g. 'dont raise if pyarrow is not installed'

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

assert not is_dtype_equal(dtype, "string[pyarrow]")
pytest.importorskip("pyarrow")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use the decorator on the function itself to skip

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


def test_constructor_invalid(self):
msg = "Parameter 'categories' must be list-like"
with pytest.raises(TypeError, match=msg):
Expand Down