-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: is_dtype_equal(dtype, "string[pyarrow]") raises if pyarrow not installed #44327
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pls add a whatsnew note
pandas/tests/base/test_misc.py
Outdated
@@ -150,6 +151,8 @@ def test_access_by_position(index): | |||
assert index[-1] == index[size - 1] | |||
|
|||
msg = f"index {size} is out of bounds for axis 0 with size {size}" | |||
|
|||
assert not is_dtype_equal(index.dtype, "string[pyarrow]") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you move to the tests for is_dtype_equal,
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
pandas/core/dtypes/common.py
Outdated
@@ -601,7 +601,8 @@ def is_dtype_equal(source, target) -> bool: | |||
# is_dtype_equal(CDT, "category") and CDT == "category" | |||
try: | |||
src = get_dtype(source) | |||
if isinstance(src, ExtensionDtype): | |||
if isinstance(src, ExtensionDtype) or isinstance(src, np.dtype): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add the example to the doc-string
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
single isinstance call isinstance(src, (ExtensionDtype, np.dtype))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
do we actually have a test which exercises this? e.g. not on a build where pyarrow is installed? |
I do not have pyarrow running on my system and got the initial import error. Actually I am able to run this on a system where pyarrow is not installed. It does not matter as pyarrow is not imported, so installation will not matter. |
Whatsnew note added |
pandas/tests/dtypes/test_dtypes.py
Outdated
@@ -115,6 +115,10 @@ def test_construction_from_string(self, dtype): | |||
with pytest.raises(TypeError, match=msg): | |||
CategoricalDtype.construct_from_string("foo") | |||
|
|||
def test_pyarrow_string_import_error(self, dtype): | |||
# GH-44276 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you flesh this out a tiny bit e.g. 'dont raise if pyarrow is not installed'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
pandas/tests/dtypes/test_dtypes.py
Outdated
def test_pyarrow_string_import_error(self, dtype): | ||
# GH-44276 | ||
assert not is_dtype_equal(dtype, "string[pyarrow]") | ||
with pytest.raises(ImportError, match="pyarrow"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
umm we don't do this, rather use the decorator to skip the test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
doc/source/whatsnew/v1.3.5.rst
Outdated
@@ -24,7 +24,7 @@ Fixed regressions | |||
|
|||
Bug fixes | |||
~~~~~~~~~ | |||
- | |||
- Fixed bug in :bug:`is_dtype_equal` raising if pyarrow not installed (:issue:`44327`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this goes in 1.4
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
pls merge master as well |
pandas/tests/dtypes/test_dtypes.py
Outdated
@@ -115,6 +115,12 @@ def test_construction_from_string(self, dtype): | |||
with pytest.raises(TypeError, match=msg): | |||
CategoricalDtype.construct_from_string("foo") | |||
|
|||
def test_pyarrow_string_import_error(self, dtype): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another note: this test is added to TestCategoricalDtype
, so categorical specific tests. I think we should move it to a more generic location. For example in tests/dtypes/test_common.py
there is a test_dtype_equal
that tests the basic functionality of is_dtype_equal
. Maybe add the new test there (or in the arrow string dtype specific tests, @jbrockmendel any preference?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Request you to inform me where do I exactly put the test, in which file and which function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can put it in tests/dtypes/test_common.py
, just after the existing test_dtype_equal
test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
pandas/tests/dtypes/test_dtypes.py
Outdated
def test_pyarrow_string_import_error(self, dtype): | ||
# GH-44276 | ||
assert not is_dtype_equal(dtype, "string[pyarrow]") | ||
pytest.importorskip("pyarrow") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use the decorator on the function itself to skip
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
pandas/tests/dtypes/test_dtypes.py
Outdated
@@ -115,6 +115,11 @@ def test_construction_from_string(self, dtype): | |||
with pytest.raises(TypeError, match=msg): | |||
CategoricalDtype.construct_from_string("foo") | |||
|
|||
@pytest.importorskip("pyarrow") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this test should be skipped if pyarrow is not installed? Because I understood the issue that checking the dtype equality currently raises if pyarrow is not installed, while it should return False instead.
So both if pyarrow is installed or not, this test should pass and is_dtype_equal
should return False.
I think you can simply remove the decorator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
pandas/tests/dtypes/test_dtypes.py
Outdated
@@ -115,6 +115,12 @@ def test_construction_from_string(self, dtype): | |||
with pytest.raises(TypeError, match=msg): | |||
CategoricalDtype.construct_from_string("foo") | |||
|
|||
def test_pyarrow_string_import_error(self, dtype): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can put it in tests/dtypes/test_common.py
, just after the existing test_dtype_equal
test.
@@ -128,6 +128,12 @@ def test_dtype_equal(name1, dtype1, name2, dtype2): | |||
assert not com.is_dtype_equal(dtype1, dtype2) | |||
|
|||
|
|||
@pytest.mark.parametrize("name,dtype", list(dtypes.items()), ids=lambda x: str(x)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To the above dtypes
dict, can you add an entry for "string": pd.StringDtype()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will be a mistake because isinstance(pd.StringDtype(), str) is False. Adding such an entry in taking to the wrong location in the code segment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isinstance(pd.StringDtype(), str)
Where is this being checked?
My question is to ensure that is_dtype_equal(pd.StringDtype(), "string[pyarrow]")
correctly works, since that was the original bug report.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if isinstance(target, str):
if not isinstance(source, str):
# GH#38516 ensure we get the same behavior from
# is_dtype_equal(CDT, "category") and CDT == "category"
# GH-44276 ensures behaviour is consistent
# is_dtype_equal (dtype, "string[pyarrow]")
try:
src = get_dtype(source)
if isinstance(src, ExtensionDtype):
if isinstance(src, (ExtensionDtype, np.dtype)):
In this code if not isinstance(source, str):
returns True when source is pd.StringDtype()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line returns True for all cases in the current test. Also the other dtype
values in the parametrization of this test are actual dtype objects, not strings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, but on adding this line into the parameters, this if not isinstance(source, str):
returns True which is wrong. Shouldn't isinstance(source, str)
be True if source is pd.StringDtype()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, isinstance(source, str)
checks whether source
is an actual python string object, while pd.StringDtype
is a dtype object, which is a different class, and thus this is expected to return False (the fact that StringDtype is a dtype object for an array that stores strings doesn't matter in that case)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added the testcase and fixed the Import Bug as well
@jorisvandenbossche @jreback any updates for me on this? |
doc/source/whatsnew/v1.4.0.rst
Outdated
@@ -509,7 +509,7 @@ Conversion | |||
|
|||
Strings | |||
^^^^^^^ | |||
- | |||
- Fixed bug in :bug:`is_dtype_equal` raising if pyarrow not installed (:issue:`44327`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this won't render & is not public facing anyhow. remove the note
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did not understand the part where "is not public facing anyhow.". I will anyhow remove it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do I remove this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can keep it, but reword it a bit to not mention is_dtype_equal
explicitly. Something like "Fixed a bug in checking if a dtype is the string[pyarrow]
dtype incorrectly raising an error when pyarrow is not installed (:issue:...)"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
pandas/core/dtypes/common.py
Outdated
@@ -596,12 +596,17 @@ def is_dtype_equal(source, target) -> bool: | |||
False | |||
""" | |||
if isinstance(target, str): | |||
if not isinstance(source, str): | |||
if get_dtype(source) == "string": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is directly False, there is no need for further checking for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the reason i ask is this might be a performance issue here (this code is hit a lot so we care about the code path)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have changed the code to remove this code segment, now there should not be a problem.
pandas/core/dtypes/common.py
Outdated
return src == target | ||
if isinstance(src, (ExtensionDtype, np.dtype)): | ||
|
||
return type(src) == type(target) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can't change this to only checking the class of the dtype, we want to check the equality of the instances. See the failing tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reverted Back to the Original Code, saw the mistake.
pandas/core/dtypes/common.py
Outdated
return src == target | ||
if isinstance(src, (ExtensionDtype, np.dtype)): | ||
|
||
return type(src) == type(target) | ||
except (TypeError, AttributeError): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can probably catch the ImportError here (so it always returns False if the above raises an ImportError)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
pandas/core/dtypes/common.py
Outdated
try: | ||
src = get_dtype(source) | ||
if isinstance(src, ExtensionDtype): | ||
if isinstance(src, (ExtensionDtype, np.dtype)): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this change still needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is needed because dtypes such as int64
float64
and other dtypes actually are np.dtype
, so if I am not adding this check, the test is failing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is needed because dtypes such as int64 float64 and other dtypes actually are np.dtype, so if I am not adding this check, the test is failing.
It is not needed if we catch ImportError more generally, I think; I pushed some small commits to verify this.
@jorisvandenbossche @jreback any updates for me on this? |
@shubham11941140 you need to merge master |
Done. |
Thanks @shubham11941140 |
Added instance to check string preventing import error