You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm working on making lib.infer_dtype copy-free and finding things would be easier/more consistent if we tweaked the meaning of the skipna keyword.
In particular, instead of doing values = values[~isnaobj(values)], followed by e.g. is_string_array(values), we could do is_string_array(values, skipna=skipna). This would change the results in cases where we have NA values that are not considered valid_na by is_string_array, e.g. in the status quo:
import pandas as pd
import numpy as np
from pandas._libs import lib
arr = np.array(["foo", pd.NaT, "bar"], dtype=object)
In [2]: lib.infer_dtype(arr, skipna=True)
Out[2]: 'string'
In [3]: lib.is_string_array(arr, skipna=True)
Out[3]: False
So the suggestion here is to change [2] to give 'mixed'. I'm finding that to make this work without breaking the world we also need to change StringValidator.is_valid_null to accept np.nan and None (currently just accepts pd.NA)
The text was updated successfully, but these errors were encountered:
xref #40839 the inconsistency between StringArray and pd.array on np.array(['a', np.nan, 'b'], dtype=object) is one of the last sticking points on implementing this.
I'm working on making lib.infer_dtype copy-free and finding things would be easier/more consistent if we tweaked the meaning of the skipna keyword.
In particular, instead of doing
values = values[~isnaobj(values)]
, followed by e.g.is_string_array(values)
, we could dois_string_array(values, skipna=skipna)
. This would change the results in cases where we have NA values that are not considered valid_na by is_string_array, e.g. in the status quo:So the suggestion here is to change [2] to give 'mixed'. I'm finding that to make this work without breaking the world we also need to change StringValidator.is_valid_null to accept np.nan and None (currently just accepts pd.NA)
The text was updated successfully, but these errors were encountered: