Skip to content

BUG: [ArrowStringArray] Recognize ArrowStringArray in infer_dtype #40725

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
Apr 9, 2021
Merged
2 changes: 1 addition & 1 deletion pandas/_libs/lib.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -1110,7 +1110,7 @@ _TYPE_MAP = {
"complex64": "complex",
"complex128": "complex",
"c": "complex",
"string": "string",
str: "string",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
str: "string",
"string": "string",
str: "string",

We could keep both, and that should address the small slowdown for inferring an array with dtype="string" (as it will first check the name, and only then the dtype.type).

(but again, the infer_dtype is mostly used to infer actual lists or object dtype arrays, the inferring of an array with already a proper dtype is fast anyway, so I don't think this small difference matters much)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed to keep both for now, so as not to affect performance of object backed StringArray.

"S": "bytes",
"U": "string",
"bool": "boolean",
Expand Down
4 changes: 2 additions & 2 deletions pandas/tests/dtypes/test_inference.py
Original file line number Diff line number Diff line change
Expand Up @@ -1267,9 +1267,9 @@ def test_interval(self):
@pytest.mark.parametrize("klass", [pd.array, Series])
@pytest.mark.parametrize("skipna", [True, False])
@pytest.mark.parametrize("data", [["a", "b", "c"], ["a", "b", pd.NA]])
def test_string_dtype(self, data, skipna, klass):
def test_string_dtype(self, data, skipna, klass, nullable_string_dtype):
# StringArray
val = klass(data, dtype="string")
val = klass(data, dtype=nullable_string_dtype)
inferred = lib.infer_dtype(val, skipna=skipna)
assert inferred == "string"

Expand Down