Skip to content

Deprecate SparseDataFrame and SparseSeries #26137

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 29 commits into from
May 29, 2019
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
Show all changes
29 commits
Select commit Hold shift + click to select a range
d518404
Squashed commit of the following:
TomAugspurger Mar 15, 2019
c32e5ff
DEPR: Deprecate SparseSeries and SparseDataFrame
TomAugspurger Mar 12, 2019
836d19b
Merge remote-tracking branch 'upstream/master' into depr-sparse-depr
TomAugspurger May 14, 2019
c0d6cf2
fixup
TomAugspurger May 14, 2019
8f06d88
fixup
TomAugspurger May 14, 2019
380c7c0
fixup
TomAugspurger May 14, 2019
21569e2
fixup
TomAugspurger May 14, 2019
6a81837
docs
TomAugspurger May 14, 2019
12a8329
remove change
TomAugspurger May 14, 2019
01c7710
fixed merge conflict
TomAugspurger May 14, 2019
e9b9b29
pickle
TomAugspurger May 14, 2019
b295ce1
fixups
TomAugspurger May 15, 2019
ccf71db
fixups
TomAugspurger May 15, 2019
7e6fbd6
doc lint
TomAugspurger May 15, 2019
865f1aa
fix pytables
TomAugspurger May 15, 2019
9915c48
temp set error
TomAugspurger May 15, 2019
30f3670
skip doctests
TomAugspurger May 15, 2019
b043243
Merge remote-tracking branch 'upstream/master' into depr-sparse-depr
TomAugspurger May 15, 2019
b2aef95
Merge remote-tracking branch 'upstream/master' into depr-sparse-depr
TomAugspurger May 16, 2019
706c5dc
fixups
TomAugspurger May 16, 2019
13d30d2
fixup
TomAugspurger May 16, 2019
c5fa3fb
updates
TomAugspurger May 16, 2019
101c425
Merge remote-tracking branch 'upstream/master' into depr-sparse-depr
TomAugspurger May 20, 2019
b76745f
fixups
TomAugspurger May 20, 2019
f153400
return
TomAugspurger May 20, 2019
0c49ddc
Merge remote-tracking branch 'upstream/master' into depr-sparse-depr
TomAugspurger May 21, 2019
1903f67
fixups
TomAugspurger May 28, 2019
0b03ac2
Merge remote-tracking branch 'upstream/master' into depr-sparse-depr
TomAugspurger May 28, 2019
12d8d83
Merge remote-tracking branch 'upstream/master' into depr-sparse-depr
TomAugspurger May 28, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
23 changes: 23 additions & 0 deletions doc/source/reference/frame.rst
Original file line number Diff line number Diff line change
Expand Up @@ -312,6 +312,29 @@ specific plotting methods of the form ``DataFrame.plot.<kind>``.
DataFrame.boxplot
DataFrame.hist


.. _api.frame.sparse:

Sparse Accessor
~~~~~~~~~~~~~~~

Sparse-dtype specific methods and attributes are provided under the
``DataFrame.sparse`` accessor.

.. autosummary::
:toctree: api/
:template: autosummary/accessor_attribute.rst

DataFrame.sparse.density

.. autosummary::
:toctree: api/

DataFrame.sparse.from_spmatrix
DataFrame.sparse.to_coo
DataFrame.sparse.to_dense


Serialization / IO / Conversion
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autosummary::
Expand Down
131 changes: 131 additions & 0 deletions doc/source/user_guide/sparse.rst
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,12 @@
Sparse data structures
**********************

.. note::

``SparseSeries`` and ``SparseDataFrame`` have been deprecated. Their purpose
is served equally well by a :class:`Series` or :class:`DataFrame` with
sparse values. See :ref:`sparse.migration` for tips on migrating.

We have implemented "sparse" versions of ``Series`` and ``DataFrame``. These are not sparse
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the diff here is all that informative. I'd recommend just viewing the new file. The basic flow is

  • short intro
  • SparseArray / SparseDtype
  • Sparse Accessors
  • SparseIndex / computation
  • Migration Guide
  • SparseSeries / SparseDataFrame.

in the typical "mostly 0". Rather, you can view these objects as being "compressed"
where any data matching a specific value (``NaN`` / missing value, though any value
Expand Down Expand Up @@ -162,6 +168,80 @@ It raises if any value cannot be coerced to specified dtype.
Out[2]:
ValueError: unable to coerce current fill_value nan to int64 dtype



We have implemented "sparse" versions of ``Series`` and ``DataFrame``. These are not sparse
in the typical "mostly 0". Rather, you can view these objects as being "compressed"
where any data matching a specific value (``NaN`` / missing value, though any value
can be chosen) is omitted. A special ``SparseIndex`` object tracks where data has been
"sparsified". This will make much more sense with an example. All of the standard pandas
data structures have a ``to_sparse`` method:

.. ipython:: python

ts = pd.Series(np.random.randn(10))
ts[2:-2] = np.nan
sts = ts.to_sparse()
sts

The ``to_sparse`` method takes a ``kind`` argument (for the sparse index, see
below) and a ``fill_value``. So if we had a mostly zero ``Series``, we could
convert it to sparse with ``fill_value=0``:

.. ipython:: python

ts.fillna(0).to_sparse(fill_value=0)

The sparse objects exist for memory efficiency reasons. Suppose you had a
large, mostly NA ``DataFrame``:

.. ipython:: python

df = pd.DataFrame(np.random.randn(10000, 4))
df.iloc[:9998] = np.nan
sdf = df.to_sparse()
sdf
sdf.density

As you can see, the density (% of values that have not been "compressed") is
extremely low. This sparse object takes up much less memory on disk (pickled)
and in the Python interpreter. Functionally, their behavior should be nearly
identical to their dense counterparts.

Any sparse object can be converted back to the standard dense form by calling
``to_dense``:

.. ipython:: python

sts.to_dense()

.. _sparse.accessor:

Sparse Accessor
---------------

.. versionadded:: 0.24.0

Pandas provides a ``.sparse`` accessor, similar to ``.str`` for string data, ``.cat``
for categorical data, and ``.dt`` for datetime-like data. This namespace provides
attributes and methods that are specific to sparse data.

.. ipython:: python

s = pd.Series([0, 0, 1, 2], dtype="Sparse[int]")
s.sparse.density
s.sparse.fill_value

This accessor is available only on data with ``SparseDtype``, and on the :class:`Series`
class itself for creating a Series with sparse data from a scipy COO matrix with.


.. versionadded:: 0.25.0

A ``.sparse`` accessor has been added for :class:`DataFrame` as well.
See :ref:`api.dataframe.sparse` for more.


.. _sparse.calculation:

Sparse Calculation
Expand Down Expand Up @@ -291,3 +371,54 @@ row and columns coordinates of the matrix. Note that this will consume a signifi

ss_dense = pd.SparseSeries.from_coo(A, dense_index=True)
ss_dense


.. _sparse.migration:

Migrating from SparseSeries and SparseDataFrame
-----------------------------------------------

:class:`SparseArray` is the building block for all of ``Series``, ``SparseSeries``,
``DataFrame``, and ``SparseDataFrame``. To simplify the pandas API and lower maintenance burden,
we've deprecated the ``SparseSeries`` and ``SparseDataFrame`` classes.

**There's no performance or memory penalty to using a Series or DataFrame with sparse values,
rather than a SparseSeries or SparseDataFrame**.

**Construction**

Use the regular :class:`Series` or :class:`DataFrame` constructors with :class:`SparseArray` values

.. ipython:: python

pd.DataFrame({"A": pd.SparseArray([0, 1])})

Or use :meth:`DataFrame.sparse.from_spmatrix`

.. ipython:: python

from scipy import sparse
mat = sparse.eye(3)
df = pd.DataFrame.sparse.from_spmatrix(mat, columns=['A', 'B', 'C'])
df

**Conversion**

Use the ``.sparse`` accessors

.. ipython:: python

df.sparse.to_dense()
df.sparse.to_coo()
df['A']

**Sparse Properties**

Sparse-specific properties, like ``density``, are available on the ``.sparse`` accssor.

.. ipython:: python

df.sparse.density

The ``SparseDataFrame.default_kind`` and ``SparseDataFrame.default_fill_value`` attributes
have no replacement.
1 change: 1 addition & 0 deletions doc/source/whatsnew/v0.25.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,7 @@ Other Enhancements
- :meth:`DatetimeIndex.union` now supports the ``sort`` argument. The behaviour of the sort parameter matches that of :meth:`Index.union` (:issue:`24994`)
- :meth:`RangeIndex.union` now supports the ``sort`` argument. If ``sort=False`` an unsorted ``Int64Index`` is always returned. ``sort=None`` is the default and returns a mononotically increasing ``RangeIndex`` if possible or a sorted ``Int64Index`` if not (:issue:`24471`)
- :meth:`DataFrame.rename` now supports the ``errors`` argument to raise errors when attempting to rename nonexistent keys (:issue:`13473`)
- Added :ref:`api.frame.sparse` for working with a ``DataFrame`` whose values are sparse (:issue:`25681`)
- :class:`RangeIndex` has gained :attr:`~RangeIndex.start`, :attr:`~RangeIndex.stop`, and :attr:`~RangeIndex.step` attributes (:issue:`25710`)
- :class:`datetime.timezone` objects are now supported as arguments to timezone methods and constructors (:issue:`25065`)
- :meth:`DataFrame.query` and :meth:`DataFrame.eval` now supports quoting column names with backticks to refer to names with spaces (:issue:`6508`)
Expand Down
Loading