API/ENH: Accept nan-likes in StringArray constructor #40839
Labels
Enhancement
NA - MaskedArrays
Related to pd.NA and nullable extension arrays
Strings
String extension data type and string data
Milestone
Is your feature request related to a problem?
Currently, StringArray can only be instantiated directly with a ndarray with strings or NA values represented by pd.NA. The only way to instantiate a StringArray with other missing value indicators(like
np.nan
andNone
) is to use pandas.array, which has a side effect of casting non-string elements to strings instead of erroring.The proposed solution would allow StringArray instantiation from a numpy array containing np.nan/None without casting non-strings. This is useful if you want the StringArray constructor to validate that inputs are strings and also accepts other missing values other than pd.NA. At the very least, it should support np.nan since StringArray is created from a numpy array, and np.nan is the missing value indicator for numpy.
Describe the solution you'd like
Either accept nan-likes in the constructor directly(breaking change) or add a parameter to the constructor allowing other na_values, maybe something like the na_values parameter from read_csv.
API breaking implications
Either breaking change or new parameter.
Describe alternatives you've considered
You'd have to do the validation yourself and validating yourself and then having StringArray validate again is not good for perf.
cc @jorisvandenbossche
The text was updated successfully, but these errors were encountered: