Skip to content

DOC: Update missing_data.rst #20424

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Mar 29, 2018
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
48 changes: 17 additions & 31 deletions doc/source/missing_data.rst
Original file line number Diff line number Diff line change
Expand Up @@ -170,52 +170,38 @@ The descriptive statistics and computational methods discussed in the
account for missing data. For example:

* When summing data, NA (missing) values will be treated as zero.
* If the data are all NA, the result will be NA.
* Methods like **cumsum** and **cumprod** ignore NA values, but preserve them
in the resulting arrays.
* If the data are all NA, the result will be 0.
* Cumulative methods like :meth:`~DataFrame.cumsum` and :meth:`~DataFrame.cumprod` ignore NA values by default, but preserve them in the resulting arrays. To override this behaviour and include NA values, use ``skipna=False``.

.. ipython:: python

df
df['one'].sum()
df.mean(1)
df.cumsum()
df.cumsum(skipna=False)


.. _missing_data.numeric_sum:

Sum/Prod of Empties/Nans
~~~~~~~~~~~~~~~~~~~~~~~~

.. warning::

This behavior is now standard as of v0.21.0; previously sum/prod would give different
results if the ``bottleneck`` package was installed.
See the :ref:`v0.21.0 whatsnew <whatsnew_0210.api_breaking.bottleneck>`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to keep some kind of warning that this behaviour recently changed (with a link to the relevant whatsnew docs)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's probably best, given the change is so new.

@pulkitmaloo could you add a small note mentioning that, with a link to the 0.22.0 whatsnew?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I've made the changes. Please review it.


With ``sum`` or ``prod`` on an empty or all-``NaN`` ``Series``, or columns of a ``DataFrame``, the result will be all-``NaN``.

.. ipython:: python

s = pd.Series([np.nan])

s.sum()

Summing over an empty ``Series`` will return ``NaN``:
With ``sum`` on an empty or all-``NaN`` ``Series``, or columns of a ``DataFrame``, the result will be 0.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reprhase as

The sum of an empty or all-NA Series or column of a DataFrame is 0.

then the example. Don't need to double backtick Series or DataFrame (we're changing our style for those).


.. ipython:: python

pd.Series([np.nan]).sum()

pd.Series([]).sum()

.. warning::

These behaviors differ from the default in ``numpy`` where an empty sum returns zero.
With ``prod`` on an empty or all-``NaN`` ``Series``, or columns of a ``DataFrame``, the result will be 1.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar

The product of an empty or all-NA Series or column of a DataFrame is 1.


.. ipython:: python

np.nansum(np.array([np.nan]))
np.nansum(np.array([]))
.. ipython:: python

pd.Series([np.nan]).prod()

pd.Series([]).prod()


NA values in GroupBy
Expand All @@ -242,7 +228,7 @@ with missing data.
Filling missing values: fillna
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The **fillna** function can "fill in" NA values with non-NA data in a couple
The :meth:`~DataFrame.fillna` function can "fill in" NA values with non-NA data in a couple
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of "The :meth:`method` function", write it as just ":meth:`method`". From the context it's clear that it's a function / method.

of ways, which we illustrate:

**Replace NA with a scalar value**
Expand Down Expand Up @@ -292,8 +278,8 @@ To remind you, these are the available filling methods:
With time series data, using pad/ffill is extremely common so that the "last
known value" is available at every time point.

The ``ffill()`` function is equivalent to ``fillna(method='ffill')``
and ``bfill()`` is equivalent to ``fillna(method='bfill')``
The :meth:`~DataFrame.ffill` function is equivalent to ``fillna(method='ffill')``
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same thing, remove "The" and "function"

and :meth:`~DataFrame.bfill` is equivalent to ``fillna(method='bfill')``

.. _missing_data.PandasObject:

Expand Down Expand Up @@ -486,7 +472,7 @@ at the new values.
Interpolation Limits
^^^^^^^^^^^^^^^^^^^^

Like other pandas fill methods, ``interpolate`` accepts a ``limit`` keyword
Like other pandas fill methods, :meth:`~DataFrame.interpolate` accepts a ``limit`` keyword
argument. Use this argument to limit the number of consecutive ``NaN`` values
filled since the last valid observation:

Expand Down Expand Up @@ -534,7 +520,7 @@ the ``limit_area`` parameter restricts filling to either inside or outside value
Replacing Generic Values
~~~~~~~~~~~~~~~~~~~~~~~~
Often times we want to replace arbitrary values with other values. The
``replace`` method in Series/DataFrame provides an efficient yet
:meth:`~DataFrame.replace` method in Series/DataFrame provides an efficient yet
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove "method". Or write out :meth:`Series.replace` and :meth:`DataFrame.replace`.

flexible way to perform such replacements.

For a Series, you can replace a single value or a list of values by another
Expand Down Expand Up @@ -763,7 +749,7 @@ contains NAs, an exception will be generated:
reindexed = s.reindex(list(range(8))).fillna(0)
reindexed[crit]

However, these can be filled in using **fillna** and it will work fine:
However, these can be filled in using :meth:`~DataFrame.fillna` and it will work fine:

.. ipython:: python

Expand Down