Skip to content

Commit 97bba51

Browse files
Dr-Irvjreback
authored andcommitted
CLN: Deprecate pandas.SparseArray for pandas.arrays.SparseArray (#30656)
1 parent 6f96331 commit 97bba51

31 files changed

+156
-142
lines changed

doc/source/development/contributing_docstring.rst

+1-1
Original file line numberDiff line numberDiff line change
@@ -399,7 +399,7 @@ DataFrame:
399399
* DataFrame
400400
* pandas.Index
401401
* pandas.Categorical
402-
* pandas.SparseArray
402+
* pandas.arrays.SparseArray
403403

404404
If the exact type is not relevant, but must be compatible with a numpy
405405
array, array-like can be specified. If Any type that can be iterated is

doc/source/getting_started/basics.rst

+1-1
Original file line numberDiff line numberDiff line change
@@ -1951,7 +1951,7 @@ documentation sections for more on each type.
19511951
| period | :class:`PeriodDtype` | :class:`Period` | :class:`arrays.PeriodArray` | ``'period[<freq>]'``, | :ref:`timeseries.periods` |
19521952
| (time spans) | | | | ``'Period[<freq>]'`` | |
19531953
+-------------------+---------------------------+--------------------+-------------------------------+-----------------------------------------+-------------------------------+
1954-
| sparse | :class:`SparseDtype` | (none) | :class:`SparseArray` | ``'Sparse'``, ``'Sparse[int]'``, | :ref:`sparse` |
1954+
| sparse | :class:`SparseDtype` | (none) | :class:`arrays.SparseArray` | ``'Sparse'``, ``'Sparse[int]'``, | :ref:`sparse` |
19551955
| | | | | ``'Sparse[float]'`` | |
19561956
+-------------------+---------------------------+--------------------+-------------------------------+-----------------------------------------+-------------------------------+
19571957
| intervals | :class:`IntervalDtype` | :class:`Interval` | :class:`arrays.IntervalArray` | ``'interval'``, ``'Interval'``, | :ref:`advanced.intervalindex` |

doc/source/getting_started/dsintro.rst

+1-1
Original file line numberDiff line numberDiff line change
@@ -741,7 +741,7 @@ implementation takes precedence and a Series is returned.
741741
np.maximum(ser, idx)
742742
743743
NumPy ufuncs are safe to apply to :class:`Series` backed by non-ndarray arrays,
744-
for example :class:`SparseArray` (see :ref:`sparse.calculation`). If possible,
744+
for example :class:`arrays.SparseArray` (see :ref:`sparse.calculation`). If possible,
745745
the ufunc is applied without converting the underlying data to an ndarray.
746746

747747
Console display

doc/source/reference/arrays.rst

+2-2
Original file line numberDiff line numberDiff line change
@@ -444,13 +444,13 @@ Sparse data
444444
-----------
445445

446446
Data where a single value is repeated many times (e.g. ``0`` or ``NaN``) may
447-
be stored efficiently as a :class:`SparseArray`.
447+
be stored efficiently as a :class:`arrays.SparseArray`.
448448

449449
.. autosummary::
450450
:toctree: api/
451451
:template: autosummary/class_without_autosummary.rst
452452

453-
SparseArray
453+
arrays.SparseArray
454454

455455
.. autosummary::
456456
:toctree: api/

doc/source/user_guide/sparse.rst

+8-8
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@ can be chosen, including 0) is omitted. The compressed values are not actually s
1515
1616
arr = np.random.randn(10)
1717
arr[2:-2] = np.nan
18-
ts = pd.Series(pd.SparseArray(arr))
18+
ts = pd.Series(pd.arrays.SparseArray(arr))
1919
ts
2020
2121
Notice the dtype, ``Sparse[float64, nan]``. The ``nan`` means that elements in the
@@ -51,7 +51,7 @@ identical to their dense counterparts.
5151
SparseArray
5252
-----------
5353

54-
:class:`SparseArray` is a :class:`~pandas.api.extensions.ExtensionArray`
54+
:class:`arrays.SparseArray` is a :class:`~pandas.api.extensions.ExtensionArray`
5555
for storing an array of sparse values (see :ref:`basics.dtypes` for more
5656
on extension arrays). It is a 1-dimensional ndarray-like object storing
5757
only values distinct from the ``fill_value``:
@@ -61,7 +61,7 @@ only values distinct from the ``fill_value``:
6161
arr = np.random.randn(10)
6262
arr[2:5] = np.nan
6363
arr[7:8] = np.nan
64-
sparr = pd.SparseArray(arr)
64+
sparr = pd.arrays.SparseArray(arr)
6565
sparr
6666
6767
A sparse array can be converted to a regular (dense) ndarray with :meth:`numpy.asarray`
@@ -144,7 +144,7 @@ to ``SparseArray`` and get a ``SparseArray`` as a result.
144144

145145
.. ipython:: python
146146
147-
arr = pd.SparseArray([1., np.nan, np.nan, -2., np.nan])
147+
arr = pd.arrays.SparseArray([1., np.nan, np.nan, -2., np.nan])
148148
np.abs(arr)
149149
150150
@@ -153,7 +153,7 @@ the correct dense result.
153153

154154
.. ipython:: python
155155
156-
arr = pd.SparseArray([1., -1, -1, -2., -1], fill_value=-1)
156+
arr = pd.arrays.SparseArray([1., -1, -1, -2., -1], fill_value=-1)
157157
np.abs(arr)
158158
np.abs(arr).to_dense()
159159
@@ -194,7 +194,7 @@ From an array-like, use the regular :class:`Series` or
194194
.. ipython:: python
195195
196196
# New way
197-
pd.DataFrame({"A": pd.SparseArray([0, 1])})
197+
pd.DataFrame({"A": pd.arrays.SparseArray([0, 1])})
198198
199199
From a SciPy sparse matrix, use :meth:`DataFrame.sparse.from_spmatrix`,
200200

@@ -256,10 +256,10 @@ Instead, you'll need to ensure that the values being assigned are sparse
256256

257257
.. ipython:: python
258258
259-
df = pd.DataFrame({"A": pd.SparseArray([0, 1])})
259+
df = pd.DataFrame({"A": pd.arrays.SparseArray([0, 1])})
260260
df['B'] = [0, 0] # remains dense
261261
df['B'].dtype
262-
df['B'] = pd.SparseArray([0, 0])
262+
df['B'] = pd.arrays.SparseArray([0, 0])
263263
df['B'].dtype
264264
265265
The ``SparseDataFrame.default_kind`` and ``SparseDataFrame.default_fill_value`` attributes

doc/source/whatsnew/v0.19.0.rst

+1
Original file line numberDiff line numberDiff line change
@@ -1225,6 +1225,7 @@ Previously, sparse data were ``float64`` dtype by default, even if all inputs we
12251225
As of v0.19.0, sparse data keeps the input dtype, and uses more appropriate ``fill_value`` defaults (``0`` for ``int64`` dtype, ``False`` for ``bool`` dtype).
12261226

12271227
.. ipython:: python
1228+
:okwarning:
12281229
12291230
pd.SparseArray([1, 2, 0, 0], dtype=np.int64)
12301231
pd.SparseArray([True, False, False, False])

doc/source/whatsnew/v0.25.0.rst

+2
Original file line numberDiff line numberDiff line change
@@ -354,6 +354,7 @@ When passed DataFrames whose values are sparse, :func:`concat` will now return a
354354
:class:`Series` or :class:`DataFrame` with sparse values, rather than a :class:`SparseDataFrame` (:issue:`25702`).
355355

356356
.. ipython:: python
357+
:okwarning:
357358
358359
df = pd.DataFrame({"A": pd.SparseArray([0, 1])})
359360
@@ -910,6 +911,7 @@ by a ``Series`` or ``DataFrame`` with sparse values.
910911
**New way**
911912
912913
.. ipython:: python
914+
:okwarning:
913915
914916
df = pd.DataFrame({"A": pd.SparseArray([0, 0, 1, 2])})
915917
df.dtypes

doc/source/whatsnew/v1.0.0.rst

+1
Original file line numberDiff line numberDiff line change
@@ -578,6 +578,7 @@ Deprecations
578578
- :meth:`DataFrame.to_stata`, :meth:`DataFrame.to_feather`, and :meth:`DataFrame.to_parquet` argument "fname" is deprecated, use "path" instead (:issue:`23574`)
579579
- The deprecated internal attributes ``_start``, ``_stop`` and ``_step`` of :class:`RangeIndex` now raise a ``FutureWarning`` instead of a ``DeprecationWarning`` (:issue:`26581`)
580580
- The ``pandas.util.testing`` module has been deprecated. Use the public API in ``pandas.testing`` documented at :ref:`api.general.testing` (:issue:`16232`).
581+
- ``pandas.SparseArray`` has been deprecated. Use ``pandas.arrays.SparseArray`` (:class:`arrays.SparseArray`) instead. (:issue:`30642`)
581582

582583
**Selecting Columns from a Grouped DataFrame**
583584

pandas/__init__.py

+17-1
Original file line numberDiff line numberDiff line change
@@ -115,7 +115,7 @@
115115
DataFrame,
116116
)
117117

118-
from pandas.core.arrays.sparse import SparseArray, SparseDtype
118+
from pandas.core.arrays.sparse import SparseDtype
119119

120120
from pandas.tseries.api import infer_freq
121121
from pandas.tseries import offsets
@@ -246,6 +246,19 @@ class Panel:
246246

247247
return type(name, (), {})
248248

249+
elif name == "SparseArray":
250+
251+
warnings.warn(
252+
"The pandas.SparseArray class is deprecated "
253+
"and will be removed from pandas in a future version. "
254+
"Use pandas.arrays.SparseArray instead.",
255+
FutureWarning,
256+
stacklevel=2,
257+
)
258+
from pandas.core.arrays.sparse import SparseArray as _SparseArray
259+
260+
return _SparseArray
261+
249262
raise AttributeError(f"module 'pandas' has no attribute '{name}'")
250263

251264

@@ -308,6 +321,9 @@ def __getattr__(self, item):
308321

309322
datetime = __Datetime().datetime
310323

324+
class SparseArray:
325+
pass
326+
311327

312328
# module level doc-string
313329
__doc__ = """

pandas/_testing.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -1492,7 +1492,7 @@ def assert_sp_array_equal(
14921492
block indices.
14931493
"""
14941494

1495-
_check_isinstance(left, right, pd.SparseArray)
1495+
_check_isinstance(left, right, pd.arrays.SparseArray)
14961496

14971497
assert_numpy_array_equal(left.sp_values, right.sp_values, check_dtype=check_dtype)
14981498

pandas/core/arrays/sparse/accessor.py

+3-3
Original file line numberDiff line numberDiff line change
@@ -163,7 +163,7 @@ def to_dense(self):
163163
164164
Examples
165165
--------
166-
>>> series = pd.Series(pd.SparseArray([0, 1, 0]))
166+
>>> series = pd.Series(pd.arrays.SparseArray([0, 1, 0]))
167167
>>> series
168168
0 0
169169
1 1
@@ -216,7 +216,7 @@ def from_spmatrix(cls, data, index=None, columns=None):
216216
-------
217217
DataFrame
218218
Each column of the DataFrame is stored as a
219-
:class:`SparseArray`.
219+
:class:`arrays.SparseArray`.
220220
221221
Examples
222222
--------
@@ -251,7 +251,7 @@ def to_dense(self):
251251
252252
Examples
253253
--------
254-
>>> df = pd.DataFrame({"A": pd.SparseArray([0, 1, 0])})
254+
>>> df = pd.DataFrame({"A": pd.arrays.SparseArray([0, 1, 0])})
255255
>>> df.sparse.to_dense()
256256
A
257257
0 0

pandas/core/arrays/sparse/array.py

+2-2
Original file line numberDiff line numberDiff line change
@@ -403,7 +403,7 @@ def from_spmatrix(cls, data):
403403
--------
404404
>>> import scipy.sparse
405405
>>> mat = scipy.sparse.coo_matrix((4, 1))
406-
>>> pd.SparseArray.from_spmatrix(mat)
406+
>>> pd.arrays.SparseArray.from_spmatrix(mat)
407407
[0.0, 0.0, 0.0, 0.0]
408408
Fill: 0.0
409409
IntIndex
@@ -1079,7 +1079,7 @@ def map(self, mapper):
10791079
10801080
Examples
10811081
--------
1082-
>>> arr = pd.SparseArray([0, 1, 2])
1082+
>>> arr = pd.arrays.SparseArray([0, 1, 2])
10831083
>>> arr.apply(lambda x: x + 10)
10841084
[10, 11, 12]
10851085
Fill: 10

pandas/core/dtypes/common.py

+5-5
Original file line numberDiff line numberDiff line change
@@ -269,9 +269,9 @@ def is_sparse(arr) -> bool:
269269
--------
270270
Returns `True` if the parameter is a 1-D pandas sparse array.
271271
272-
>>> is_sparse(pd.SparseArray([0, 0, 1, 0]))
272+
>>> is_sparse(pd.arrays.SparseArray([0, 0, 1, 0]))
273273
True
274-
>>> is_sparse(pd.Series(pd.SparseArray([0, 0, 1, 0])))
274+
>>> is_sparse(pd.Series(pd.arrays.SparseArray([0, 0, 1, 0])))
275275
True
276276
277277
Returns `False` if the parameter is not sparse.
@@ -318,7 +318,7 @@ def is_scipy_sparse(arr) -> bool:
318318
>>> from scipy.sparse import bsr_matrix
319319
>>> is_scipy_sparse(bsr_matrix([1, 2, 3]))
320320
True
321-
>>> is_scipy_sparse(pd.SparseArray([1, 2, 3]))
321+
>>> is_scipy_sparse(pd.arrays.SparseArray([1, 2, 3]))
322322
False
323323
"""
324324

@@ -1467,7 +1467,7 @@ def is_bool_dtype(arr_or_dtype) -> bool:
14671467
True
14681468
>>> is_bool_dtype(pd.Categorical([True, False]))
14691469
True
1470-
>>> is_bool_dtype(pd.SparseArray([True, False]))
1470+
>>> is_bool_dtype(pd.arrays.SparseArray([True, False]))
14711471
True
14721472
"""
14731473
if arr_or_dtype is None:
@@ -1529,7 +1529,7 @@ def is_extension_type(arr) -> bool:
15291529
True
15301530
>>> is_extension_type(pd.Series(cat))
15311531
True
1532-
>>> is_extension_type(pd.SparseArray([1, 2, 3]))
1532+
>>> is_extension_type(pd.arrays.SparseArray([1, 2, 3]))
15331533
True
15341534
>>> from scipy.sparse import bsr_matrix
15351535
>>> is_extension_type(bsr_matrix([1, 2, 3]))

pandas/tests/api/test_api.py

+1-2
Original file line numberDiff line numberDiff line change
@@ -67,7 +67,6 @@ class TestPDApi(Base):
6767
"RangeIndex",
6868
"UInt64Index",
6969
"Series",
70-
"SparseArray",
7170
"SparseDtype",
7271
"StringDtype",
7372
"Timedelta",
@@ -91,7 +90,7 @@ class TestPDApi(Base):
9190
"NamedAgg",
9291
]
9392
if not compat.PY37:
94-
classes.extend(["Panel", "SparseSeries", "SparseDataFrame"])
93+
classes.extend(["Panel", "SparseSeries", "SparseDataFrame", "SparseArray"])
9594
deprecated_modules.extend(["np", "datetime"])
9695

9796
# these are already deprecated; awaiting removal

pandas/tests/arrays/sparse/test_accessor.py

+9-10
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,7 @@
77

88
import pandas as pd
99
import pandas._testing as tm
10+
from pandas.core.arrays.sparse import SparseArray, SparseDtype
1011

1112

1213
class TestSeriesAccessor:
@@ -31,7 +32,7 @@ def test_accessor_raises(self):
3132
def test_from_spmatrix(self, format, labels, dtype):
3233
import scipy.sparse
3334

34-
sp_dtype = pd.SparseDtype(dtype, np.array(0, dtype=dtype).item())
35+
sp_dtype = SparseDtype(dtype, np.array(0, dtype=dtype).item())
3536

3637
mat = scipy.sparse.eye(10, format=format, dtype=dtype)
3738
result = pd.DataFrame.sparse.from_spmatrix(mat, index=labels, columns=labels)
@@ -48,7 +49,7 @@ def test_from_spmatrix(self, format, labels, dtype):
4849
def test_from_spmatrix_columns(self, columns):
4950
import scipy.sparse
5051

51-
dtype = pd.SparseDtype("float64", 0.0)
52+
dtype = SparseDtype("float64", 0.0)
5253

5354
mat = scipy.sparse.random(10, 2, density=0.5)
5455
result = pd.DataFrame.sparse.from_spmatrix(mat, columns=columns)
@@ -67,9 +68,9 @@ def test_to_coo(self):
6768
def test_to_dense(self):
6869
df = pd.DataFrame(
6970
{
70-
"A": pd.SparseArray([1, 0], dtype=pd.SparseDtype("int64", 0)),
71-
"B": pd.SparseArray([1, 0], dtype=pd.SparseDtype("int64", 1)),
72-
"C": pd.SparseArray([1.0, 0.0], dtype=pd.SparseDtype("float64", 0.0)),
71+
"A": SparseArray([1, 0], dtype=SparseDtype("int64", 0)),
72+
"B": SparseArray([1, 0], dtype=SparseDtype("int64", 1)),
73+
"C": SparseArray([1.0, 0.0], dtype=SparseDtype("float64", 0.0)),
7374
},
7475
index=["b", "a"],
7576
)
@@ -82,8 +83,8 @@ def test_to_dense(self):
8283
def test_density(self):
8384
df = pd.DataFrame(
8485
{
85-
"A": pd.SparseArray([1, 0, 2, 1], fill_value=0),
86-
"B": pd.SparseArray([0, 1, 1, 1], fill_value=0),
86+
"A": SparseArray([1, 0, 2, 1], fill_value=0),
87+
"B": SparseArray([0, 1, 1, 1], fill_value=0),
8788
}
8889
)
8990
res = df.sparse.density
@@ -99,9 +100,7 @@ def test_series_from_coo(self, dtype, dense_index):
99100
A = scipy.sparse.eye(3, format="coo", dtype=dtype)
100101
result = pd.Series.sparse.from_coo(A, dense_index=dense_index)
101102
index = pd.MultiIndex.from_tuples([(0, 0), (1, 1), (2, 2)])
102-
expected = pd.Series(
103-
pd.SparseArray(np.array([1, 1, 1], dtype=dtype)), index=index
104-
)
103+
expected = pd.Series(SparseArray(np.array([1, 1, 1], dtype=dtype)), index=index)
105104
if dense_index:
106105
expected = expected.reindex(pd.MultiIndex.from_product(index.levels))
107106

0 commit comments

Comments
 (0)