Skip to content

DOC/TST: Document numpy 2.0 support and add tests for string array #58202

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Apr 10, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 15 additions & 0 deletions doc/source/whatsnew/v2.2.2.rst
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,21 @@ including other versions of pandas.
{{ header }}

.. ---------------------------------------------------------------------------

.. _whatsnew_220.np2_compat:

Pandas 2.2.2 is now compatible with numpy 2.0
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Pandas 2.2.2 is the first version of pandas that is generally compatible with the upcoming
numpy 2.0 release, and wheels for pandas 2.2.2 will work with both numpy 1.x and 2.x.

One major caveat is that arrays created with numpy 2.0's new ``StringDtype`` will convert
to ``object`` dtyped arrays upon :class:`Series`/:class:`DataFrame` creation.
Full support for numpy 2.0's StringDtype is expected to land in pandas 3.0.

As usual please report any bugs discovered to our `issue tracker <https://github.com/pandas-dev/pandas/issues/new/choose>`_

.. _whatsnew_222.regressions:

Fixed regressions
Expand Down
19 changes: 19 additions & 0 deletions pandas/tests/frame/test_constructors.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@
from pandas._config import using_pyarrow_string_dtype

from pandas._libs import lib
from pandas.compat.numpy import np_version_gt2
from pandas.errors import IntCastingNaNError

from pandas.core.dtypes.common import is_integer_dtype
Expand Down Expand Up @@ -3052,6 +3053,24 @@ def test_from_dict_with_columns_na_scalar(self):
expected = DataFrame({"a": Series([pd.NaT, pd.NaT])})
tm.assert_frame_equal(result, expected)

# TODO: make this not cast to object in pandas 3.0
@pytest.mark.skipif(
not np_version_gt2, reason="StringDType only available in numpy 2 and above"
)
@pytest.mark.parametrize(
"data",
[
{"a": ["a", "b", "c"], "b": [1.0, 2.0, 3.0], "c": ["d", "e", "f"]},
],
)
def test_np_string_array_object_cast(self, data):
from numpy.dtypes import StringDType

data["a"] = np.array(data["a"], dtype=StringDType())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious what happens when dtype=StringDType() is passed to pandas Series or DataFrame constructor? Fine if it doesn't work/converts to object but would be good to note if something unexpected happens

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it silently casts to object.

FWIW, I think the numpy fixed length strings also do this.

res = DataFrame(data)
assert res["a"].dtype == np.object_
assert (res["a"] == data["a"]).all()


def get1(obj): # TODO: make a helper in tm?
if isinstance(obj, Series):
Expand Down
19 changes: 19 additions & 0 deletions pandas/tests/series/test_constructors.py
Original file line number Diff line number Diff line change
Expand Up @@ -2176,6 +2176,25 @@ def test_series_constructor_infer_multiindex(self, container, data):
multi = Series(data, index=indexes)
assert isinstance(multi.index, MultiIndex)

# TODO: make this not cast to object in pandas 3.0
@pytest.mark.skipif(
not np_version_gt2, reason="StringDType only available in numpy 2 and above"
)
@pytest.mark.parametrize(
"data",
[
["a", "b", "c"],
["a", "b", np.nan],
],
)
def test_np_string_array_object_cast(self, data):
from numpy.dtypes import StringDType

arr = np.array(data, dtype=StringDType())
res = Series(arr)
assert res.dtype == np.object_
assert (res == data).all()


class TestSeriesConstructorInternals:
def test_constructor_no_pandas_array(self):
Expand Down