-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
TST: [ArrowStringArray] more parameterised testing - part 1 #40679
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TST: [ArrowStringArray] more parameterised testing - part 1 #40679
Conversation
pandas/_libs/lib.pyx
Outdated
@@ -1109,7 +1109,7 @@ _TYPE_MAP = { | |||
"complex64": "complex", | |||
"complex128": "complex", | |||
"c": "complex", | |||
"string": "string", | |||
str: "string", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could add this instead of changing.
both StringDtype and ArrowStringDtype have type
attrribute as str
at the moment the inference matches on name
so this would need to change for 'string[python]' if we wanted to match on name
edit: kind
-> type
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, this doesn't break anything? cause any perf issues?
Parametrized fixture for string dtypes. | ||
|
||
* 'string' | ||
* 'arrow_string' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this will change in #39908, but getting the fixture in play as a pre-cursor should still be an advantage.
@@ -193,7 +194,7 @@ def astype(self, dtype, copy=True): | |||
if copy: | |||
return self.copy() | |||
return self | |||
elif isinstance(dtype, StringDtype): | |||
elif isinstance(dtype, (StringDtype, ArrowStringDtype)): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ArrowStringDtype is removed in #39908 and therefore this change will be reverted.
pandas/_libs/lib.pyx
Outdated
@@ -1109,7 +1109,7 @@ _TYPE_MAP = { | |||
"complex64": "complex", | |||
"complex128": "complex", | |||
"c": "complex", | |||
"string": "string", | |||
str: "string", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, this doesn't break anything? cause any perf issues?
only the tests that test inference seem to break. so we currently have ArrowStringArray not infered as string in master. maybe not used but I'll look in detail and test perf tomorrow. This will slow down this PR and I want to get the fixture in play so that I can parameterise many other tests that test for only |
on master
|
OK have reverted the inference change... so can open a PR on the str accessor and parameterise existing str accessor tests for that, but nbd if this is not ready as can just dup the fixture and resolve conflicts if necessary. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me!
Failures are the matplotlib ones |
aim to reduce diff to address #39908 (comment), but good to do as follow-up to #35259 anyway.
This also includes a change to inference, which could be broken off if not wanted as part of a TST PR.
more tests to change once fixture in play and also some parameterisation to add for the str accessor testing (which could be a precursor with xfails for ArrowStringArray or combined with PR that adds the accessor.. will open a draft shortly for discussion)