Skip to content

BUG: StringArray is a subclass of PandasArray #48638

Open
@ehsantn

Description

@ehsantn

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

In [2]: A = pd.array(["a1", "a2"], dtype="string")

In [3]: A
Out[3]:
<StringArray>
['a1', 'a2']
Length: 2, dtype: string

In [4]: isinstance(A, pd.arrays.StringArray)
Out[4]: True

In [5]: isinstance(A, pd.arrays.PandasArray)
Out[5]: True

Issue Description

PandasArray is described as a thin wrapper around Numpy values for internal compatibility. However, StringArray is a subclass of PandasArray for some reason (other ExtensionArrays are not). This inconsistency causes issues for unboxing the values for Bodo JIT (and I assume other systems consuming Pandas data).

https://pandas.pydata.org/docs/reference/api/pandas.Series.array.html
https://pandas.pydata.org/docs/reference/api/pandas.arrays.PandasArray.html

Expected Behavior

Output of isinstance(pd.array(["a1", "a2"], dtype="string"), pd.arrays.PandasArray) should be False.

Installed Versions

INSTALLED VERSIONS ------------------ commit : bf856a8 python : 3.9.6.final.0 python-bits : 64 OS : Darwin OS-release : 21.2.0 Version : Darwin Kernel Version 21.2.0: Sun Nov 28 20:28:54 PST 2021; root:xnu-8019.61.5~1/RELEASE_X86_64 machine : x86_64 processor : i386 byteorder : little LC_ALL : None LANG : en_US.UTF-8 LOCALE : en_US.UTF-8

pandas : 1.6.0.dev0+145.gbf856a85a3
numpy : 1.21.1
pytz : 2021.1
dateutil : 2.8.2
setuptools : 49.6.0.post20210108
pip : 21.2.2
Cython : 0.29.32
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : None
IPython : 7.26.0
pandas_datareader: None
bs4 : None
bottleneck : None
brotli :
fastparquet : None
fsspec : None
gcsfs : None
matplotlib : None
numba : None
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pyreadstat : None
pyxlsb : None
s3fs : None
scipy : None
snappy : None
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
xlwt : None
zstandard : None
tzdata : None

Metadata

Metadata

Assignees

No one assigned

    Labels

    Closing CandidateMay be closeable, needs more eyeballsExtensionArrayExtending pandas with custom dtypes or arrays.InternalsRelated to non-user accessible pandas implementationStringsString extension data type and string data

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions