Description
Pandas version checks
-
I have checked that this issue has not already been reported.
-
I have confirmed this bug exists on the latest version of pandas.
-
I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
In [2]: A = pd.array(["a1", "a2"], dtype="string")
In [3]: A
Out[3]:
<StringArray>
['a1', 'a2']
Length: 2, dtype: string
In [4]: isinstance(A, pd.arrays.StringArray)
Out[4]: True
In [5]: isinstance(A, pd.arrays.PandasArray)
Out[5]: True
Issue Description
PandasArray is described as a thin wrapper around Numpy values for internal compatibility. However, StringArray is a subclass of PandasArray for some reason (other ExtensionArrays are not). This inconsistency causes issues for unboxing the values for Bodo JIT (and I assume other systems consuming Pandas data).
https://pandas.pydata.org/docs/reference/api/pandas.Series.array.html
https://pandas.pydata.org/docs/reference/api/pandas.arrays.PandasArray.html
Expected Behavior
Output of isinstance(pd.array(["a1", "a2"], dtype="string"), pd.arrays.PandasArray)
should be False.
Installed Versions
pandas : 1.6.0.dev0+145.gbf856a85a3
numpy : 1.21.1
pytz : 2021.1
dateutil : 2.8.2
setuptools : 49.6.0.post20210108
pip : 21.2.2
Cython : 0.29.32
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : None
IPython : 7.26.0
pandas_datareader: None
bs4 : None
bottleneck : None
brotli :
fastparquet : None
fsspec : None
gcsfs : None
matplotlib : None
numba : None
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pyreadstat : None
pyxlsb : None
s3fs : None
scipy : None
snappy : None
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
xlwt : None
zstandard : None
tzdata : None