Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
API: allow nan-likes in StringArray constructor #41412
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
API: allow nan-likes in StringArray constructor #41412
Changes from 3 commits
3e1784d
96ff1da
418e1d2
25a6c4d
47d68f7
8257dbd
922436a
2f28086
3ee2198
9426a52
3ee55f2
fe4981a
a66948a
e852719
91b73bb
42ec3e4
41f49d2
62cc5be
29909f3
153b6b4
b27a839
ed5b953
caa5705
2d75031
52a00d1
8dc0b66
1bacaed
66be087
7b058cd
1be1bdf
3c57094
03738a9
12351de
889829a
358000f
c649b1d
5e5aa9c
eb7d8f2
2426319
20817a7
33d8f9a
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this going through ensure_string_array instead of e.g. is_string_array? For the latter, the ravel would be unnecessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is_string_array will not convert the other nans(None, np.nan, etc.) to the correct na_value of pd.NA. FWIW, switching to ensure_string_array will also align us with _from_sequence.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does any of this become simpler if done in conjunction with the edits in #45057
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will try to split this one up.
I'm thinking of maybe sticking to is_string_array and then doing another pass over the data to convert the not pd.NA nans, as a quick short term fix to unblock you. This will probably give a perf regression, but since is_string_array got sped up in #44495 on master(not 1.3.x), all I have to do is make the perf regression less than the perf improvement there, so that the regression is not user visible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you parameterize over list as well (assume same result).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC from #41412 (comment), we now call ensure_string_array in the constructor whereas previously is_string_array was called. so why is lib.is_string_array changing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lib.is_string_array is used for inferring dtypes elsewhere I think, and we want ndarrays with strings and np.nan/None to be inferred as string in addition to ndarrays with only strings/pd.NA.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right. so is
is_string_array
called before passing the values to theStringArray
constructor? The reason I ask is that a reasonably contained behavior change to StringArray could be backported to 1.3.x during the rc periodwill hopefully get a chance to look in more detail soon, but if you could summarize why it's changing that'll be great.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, it isn't called anymore in this PR, but it should be changed(allow np.nan/None) to be consistent with what we are accepting in the constructor now.
The actual change is just a one-liner here, so should be pretty harmless. https://github.com/pandas-dev/pandas/pull/41412/files#diff-c61205b9d6ae5756840d1ed6915157fe9d99aa33b29a762f821e9fe38ab0277cR1900