Skip to content

Commit 365841f

Browse files
authored
BUG: astype to pyarrow does not copy np array (#50984)
* BUG: astype to pyarrow does not copy np array * Add gh ref * Use deepcopy
1 parent ed7ce76 commit 365841f

File tree

3 files changed

+17
-0
lines changed

3 files changed

+17
-0
lines changed

doc/source/whatsnew/v2.0.0.rst

+1
Original file line numberDiff line numberDiff line change
@@ -1019,6 +1019,7 @@ Conversion
10191019
- Bug in :class:`.arrays.ArrowExtensionArray` that would raise ``NotImplementedError`` when passed a sequence of strings or binary (:issue:`49172`)
10201020
- Bug in :meth:`Series.astype` raising ``pyarrow.ArrowInvalid`` when converting from a non-pyarrow string dtype to a pyarrow numeric type (:issue:`50430`)
10211021
- Bug in :meth:`Series.to_numpy` converting to NumPy array before applying ``na_value`` (:issue:`48951`)
1022+
- Bug in :meth:`DataFrame.astype` not copying data when converting to pyarrow dtype (:issue:`50984`)
10221023
- Bug in :func:`to_datetime` was not respecting ``exact`` argument when ``format`` was an ISO8601 format (:issue:`12649`)
10231024
- Bug in :meth:`TimedeltaArray.astype` raising ``TypeError`` when converting to a pyarrow duration type (:issue:`49795`)
10241025
-

pandas/core/arrays/arrow/array.py

+4
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,6 @@
11
from __future__ import annotations
22

3+
from copy import deepcopy
34
from typing import (
45
TYPE_CHECKING,
56
Any,
@@ -220,6 +221,9 @@ def _from_sequence(cls, scalars, *, dtype: Dtype | None = None, copy: bool = Fal
220221
if isinstance(scalars, cls):
221222
scalars = scalars._data
222223
elif not isinstance(scalars, (pa.Array, pa.ChunkedArray)):
224+
if copy and is_array_like(scalars):
225+
# pa array should not get updated when numpy array is updated
226+
scalars = deepcopy(scalars)
223227
try:
224228
scalars = pa.array(scalars, type=pa_dtype, from_pandas=True)
225229
except pa.ArrowInvalid:

pandas/tests/frame/methods/test_astype.py

+12
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,7 @@
33
import numpy as np
44
import pytest
55

6+
from pandas.compat import pa_version_under6p0
67
import pandas.util._test_decorators as td
78

89
import pandas as pd
@@ -867,3 +868,14 @@ def test_frame_astype_no_copy():
867868

868869
assert result.a.dtype == pd.Int16Dtype()
869870
assert np.shares_memory(df.b.values, result.b.values)
871+
872+
873+
@pytest.mark.skipif(pa_version_under6p0, reason="pyarrow is required for this test")
874+
@pytest.mark.parametrize("dtype", ["int64", "Int64"])
875+
def test_astype_copies(dtype):
876+
# GH#50984
877+
df = DataFrame({"a": [1, 2, 3]}, dtype=dtype)
878+
result = df.astype("int64[pyarrow]", copy=True)
879+
df.iloc[0, 0] = 100
880+
expected = DataFrame({"a": [1, 2, 3]}, dtype="int64[pyarrow]")
881+
tm.assert_frame_equal(result, expected)

0 commit comments

Comments
 (0)