Skip to content

TST: [ArrowStringArray] more parameterised testing - part 1 #40679

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

simonjayhawkins
Copy link
Member

aim to reduce diff to address #39908 (comment), but good to do as follow-up to #35259 anyway.

This also includes a change to inference, which could be broken off if not wanted as part of a TST PR.

more tests to change once fixture in play and also some parameterisation to add for the str accessor testing (which could be a precursor with xfails for ArrowStringArray or combined with PR that adds the accessor.. will open a draft shortly for discussion)

@simonjayhawkins simonjayhawkins added Testing pandas testing functions or related to the test suite Strings String extension data type and string data labels Mar 29, 2021
@simonjayhawkins simonjayhawkins added this to the 1.3 milestone Mar 29, 2021
@@ -1109,7 +1109,7 @@ _TYPE_MAP = {
"complex64": "complex",
"complex128": "complex",
"c": "complex",
"string": "string",
str: "string",
Copy link
Member Author

@simonjayhawkins simonjayhawkins Mar 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could add this instead of changing.

both StringDtype and ArrowStringDtype have type attrribute as str

at the moment the inference matches on name so this would need to change for 'string[python]' if we wanted to match on name

edit: kind -> type

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, this doesn't break anything? cause any perf issues?

Parametrized fixture for string dtypes.

* 'string'
* 'arrow_string'
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this will change in #39908, but getting the fixture in play as a pre-cursor should still be an advantage.

@@ -193,7 +194,7 @@ def astype(self, dtype, copy=True):
if copy:
return self.copy()
return self
elif isinstance(dtype, StringDtype):
elif isinstance(dtype, (StringDtype, ArrowStringDtype)):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ArrowStringDtype is removed in #39908 and therefore this change will be reverted.

@@ -1109,7 +1109,7 @@ _TYPE_MAP = {
"complex64": "complex",
"complex128": "complex",
"c": "complex",
"string": "string",
str: "string",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, this doesn't break anything? cause any perf issues?

@simonjayhawkins
Copy link
Member Author

only the tests that test inference seem to break. so we currently have ArrowStringArray not infered as string in master. maybe not used but I'll look in detail and test perf tomorrow.

This will slow down this PR and I want to get the fixture in play so that I can parameterise many other tests that test for only dtype="string" so I will remove the inference change and pararmeterisation of the inference test in this PR and do that as a separate PR. (after this PR with the fixture in conftest is merged)

@simonjayhawkins
Copy link
Member Author

on master

>>> pd.__version__
'1.3.0.dev0+1180.gdd697e1cca'
>>> 
>>> from pandas.core.arrays.string_arrow import ArrowStringDtype
>>> 
>>> arr = ArrowStringDtype.construct_array_type()._from_sequence(["B", pd.NA, "A"])
>>> 
>>> arr
<ArrowStringArray>
['B', <NA>, 'A']
Length: 3, dtype: arrow_string
>>> 
>>> pd._libs.lib.infer_dtype(arr)
'unknown-array'
>>> ```

@simonjayhawkins
Copy link
Member Author

OK have reverted the inference change... so can open a PR on the str accessor and parameterise existing str accessor tests for that, but nbd if this is not ready as can just dup the fixture and resolve conflicts if necessary.

Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me!

@jorisvandenbossche
Copy link
Member

Failures are the matplotlib ones

@jorisvandenbossche jorisvandenbossche merged commit 5c64037 into pandas-dev:master Apr 1, 2021
@simonjayhawkins simonjayhawkins deleted the nullable_string_dtype branch April 1, 2021 14:45
JulianWgs pushed a commit to JulianWgs/pandas that referenced this pull request Jul 3, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Strings String extension data type and string data Testing pandas testing functions or related to the test suite
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants