Skip to content

DOC: Update missing_data.rst #20424

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Mar 29, 2018
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
60 changes: 26 additions & 34 deletions doc/source/missing_data.rst
Original file line number Diff line number Diff line change
Expand Up @@ -75,7 +75,7 @@ arise and we wish to also consider that "missing" or "not available" or "NA".
To make detecting missing values easier (and across different array dtypes),
pandas provides the :func:`isna` and
:func:`notna` functions, which are also methods on
``Series`` and ``DataFrame`` objects:
Series and DataFrame objects:

.. ipython:: python

Expand Down Expand Up @@ -170,16 +170,16 @@ The descriptive statistics and computational methods discussed in the
account for missing data. For example:

* When summing data, NA (missing) values will be treated as zero.
* If the data are all NA, the result will be NA.
* Methods like **cumsum** and **cumprod** ignore NA values, but preserve them
in the resulting arrays.
* If the data are all NA, the result will be 0.
* Cumulative methods like :meth:`~DataFrame.cumsum` and :meth:`~DataFrame.cumprod` ignore NA values by default, but preserve them in the resulting arrays. To override this behaviour and include NA values, use ``skipna=False``.

.. ipython:: python

df
df['one'].sum()
df.mean(1)
df.cumsum()
df.cumsum(skipna=False)


.. _missing_data.numeric_sum:
Expand All @@ -189,33 +189,24 @@ Sum/Prod of Empties/Nans

.. warning::

This behavior is now standard as of v0.21.0; previously sum/prod would give different
results if the ``bottleneck`` package was installed.
See the :ref:`v0.21.0 whatsnew <whatsnew_0210.api_breaking.bottleneck>`.
This behavior is now standard as of v0.22.0 and is consistent with the default in ``numpy``; previously sum/prod of all-NA or empty Series/DataFrames would return NaN.
See :ref:`v0.22.0 whatsnew <whatsnew_0220>` for more.

With ``sum`` or ``prod`` on an empty or all-``NaN`` ``Series``, or columns of a ``DataFrame``, the result will be all-``NaN``.

.. ipython:: python

s = pd.Series([np.nan])

s.sum()

Summing over an empty ``Series`` will return ``NaN``:
The sum of an empty or all-NA Series or column of a DataFrame is 0.

.. ipython:: python

pd.Series([np.nan]).sum()

pd.Series([]).sum()

.. warning::
The product of an empty or all-NA Series or column of a DataFrame is 1.

These behaviors differ from the default in ``numpy`` where an empty sum returns zero.

.. ipython:: python

np.nansum(np.array([np.nan]))
np.nansum(np.array([]))
.. ipython:: python

pd.Series([np.nan]).prod()

pd.Series([]).prod()


NA values in GroupBy
Expand All @@ -242,7 +233,7 @@ with missing data.
Filling missing values: fillna
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The **fillna** function can "fill in" NA values with non-NA data in a couple
:meth:`~DataFrame.fillna` can "fill in" NA values with non-NA data in a couple
of ways, which we illustrate:

**Replace NA with a scalar value**
Expand Down Expand Up @@ -292,8 +283,8 @@ To remind you, these are the available filling methods:
With time series data, using pad/ffill is extremely common so that the "last
known value" is available at every time point.

The ``ffill()`` function is equivalent to ``fillna(method='ffill')``
and ``bfill()`` is equivalent to ``fillna(method='bfill')``
:meth:`~DataFrame.ffill` is equivalent to ``fillna(method='ffill')``
and :meth:`~DataFrame.bfill` is equivalent to ``fillna(method='bfill')``

.. _missing_data.PandasObject:

Expand Down Expand Up @@ -329,7 +320,7 @@ Dropping axis labels with missing data: dropna
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

You may wish to simply exclude labels from a data set which refer to missing
data. To do this, use the :meth:`~DataFrame.dropna` method:
data. To do this, use :meth:`~DataFrame.dropna`:

.. ipython:: python
:suppress:
Expand All @@ -344,7 +335,7 @@ data. To do this, use the :meth:`~DataFrame.dropna` method:
df.dropna(axis=1)
df['one'].dropna()

An equivalent :meth:`~Series.dropna` method is available for Series.
An equivalent :meth:`~Series.dropna` is available for Series.
DataFrame.dropna has considerably more options than Series.dropna, which can be
examined :ref:`in the API <api.dataframe.missing>`.

Expand All @@ -357,7 +348,7 @@ Interpolation

The ``limit_area`` keyword argument was added.

Both Series and DataFrame objects have an :meth:`~DataFrame.interpolate` method
Both Series and DataFrame objects have :meth:`~DataFrame.interpolate`
that, by default, performs linear interpolation at missing datapoints.

.. ipython:: python
Expand Down Expand Up @@ -486,7 +477,7 @@ at the new values.
Interpolation Limits
^^^^^^^^^^^^^^^^^^^^

Like other pandas fill methods, ``interpolate`` accepts a ``limit`` keyword
Like other pandas fill methods, :meth:`~DataFrame.interpolate` accepts a ``limit`` keyword
argument. Use this argument to limit the number of consecutive ``NaN`` values
filled since the last valid observation:

Expand Down Expand Up @@ -533,8 +524,9 @@ the ``limit_area`` parameter restricts filling to either inside or outside value

Replacing Generic Values
~~~~~~~~~~~~~~~~~~~~~~~~
Often times we want to replace arbitrary values with other values. The
``replace`` method in Series/DataFrame provides an efficient yet
Often times we want to replace arbitrary values with other values.

:meth:`~Series.replace` in Series and :meth:`~DataFrame.replace` in DataFrame provides an efficient yet
flexible way to perform such replacements.

For a Series, you can replace a single value or a list of values by another
Expand Down Expand Up @@ -674,7 +666,7 @@ want to use a regular expression.
Numeric Replacement
~~~~~~~~~~~~~~~~~~~

The :meth:`~DataFrame.replace` method is similar to :meth:`~DataFrame.fillna`.
:meth:`~DataFrame.replace` is similar to :meth:`~DataFrame.fillna`.

.. ipython:: python

Expand Down Expand Up @@ -763,7 +755,7 @@ contains NAs, an exception will be generated:
reindexed = s.reindex(list(range(8))).fillna(0)
reindexed[crit]

However, these can be filled in using **fillna** and it will work fine:
However, these can be filled in using :meth:`~DataFrame.fillna` and it will work fine:

.. ipython:: python

Expand Down