-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: [ArrowStringArray] Recognize ArrowStringArray in infer_dtype #40725
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This reverts commit c095cd4.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I merged the pre-cursor, so this can be updated now.
For the rest looks good to me (using str
(the dtype.type
) in the _TYPE_MAP
is a similar solution as what we did for PeriodDtype (which also has a parametrized name), so that's seems an appropriate solution to me)
Thanks @jorisvandenbossche will get the open PRs updated. and a couple of PRs for astype failures (both ways) in the pipeline. one uses the fixture #39908 (comment) and some more parameterization of existing tests where |
@simonjayhawkins this can be undrafted? |
the benchmarks are running and will have a answer later today. |
(I don't think this will have any impact on performance, though. And we already did the same change for Period) |
|
hmm, not good.
will look in detail tomorrow. |
@simonjayhawkins I think that is mostly within noise-bounds. This We can also check the function itself specifically:
So for a StringArray input, there is some change. But even then, the question is whether this is change is significant for actual usage. We typically use |
@@ -1110,7 +1110,7 @@ _TYPE_MAP = { | |||
"complex64": "complex", | |||
"complex128": "complex", | |||
"c": "complex", | |||
"string": "string", | |||
str: "string", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
str: "string", | |
"string": "string", | |
str: "string", |
We could keep both, and that should address the small slowdown for inferring an array with dtype="string"
(as it will first check the name, and only then the dtype.type
).
(but again, the infer_dtype is mostly used to infer actual lists or object dtype arrays, the inferring of an array with already a proper dtype is fast anyway, so I don't think this small difference matters much)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed to keep both for now, so as not to affect performance of object backed StringArray.
thanks @simonjayhawkins |
PR is draft. follow-up to #40679, #40679 (comment) which is not yet merged and need to make window to run benchmarks #40679 (review)
on master