-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
DOC: Update missing_data.rst #20424
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DOC: Update missing_data.rst #20424
Changes from 1 commit
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -170,52 +170,38 @@ The descriptive statistics and computational methods discussed in the | |
account for missing data. For example: | ||
|
||
* When summing data, NA (missing) values will be treated as zero. | ||
* If the data are all NA, the result will be NA. | ||
* Methods like **cumsum** and **cumprod** ignore NA values, but preserve them | ||
in the resulting arrays. | ||
* If the data are all NA, the result will be 0. | ||
* Cumulative methods like :meth:`~DataFrame.cumsum` and :meth:`~DataFrame.cumprod` ignore NA values by default, but preserve them in the resulting arrays. To override this behaviour and include NA values, use ``skipna=False``. | ||
|
||
.. ipython:: python | ||
|
||
df | ||
df['one'].sum() | ||
df.mean(1) | ||
df.cumsum() | ||
df.cumsum(skipna=False) | ||
|
||
|
||
.. _missing_data.numeric_sum: | ||
|
||
Sum/Prod of Empties/Nans | ||
~~~~~~~~~~~~~~~~~~~~~~~~ | ||
|
||
.. warning:: | ||
|
||
This behavior is now standard as of v0.21.0; previously sum/prod would give different | ||
results if the ``bottleneck`` package was installed. | ||
See the :ref:`v0.21.0 whatsnew <whatsnew_0210.api_breaking.bottleneck>`. | ||
|
||
With ``sum`` or ``prod`` on an empty or all-``NaN`` ``Series``, or columns of a ``DataFrame``, the result will be all-``NaN``. | ||
|
||
.. ipython:: python | ||
|
||
s = pd.Series([np.nan]) | ||
|
||
s.sum() | ||
|
||
Summing over an empty ``Series`` will return ``NaN``: | ||
With ``sum`` on an empty or all-``NaN`` ``Series``, or columns of a ``DataFrame``, the result will be 0. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Reprhase as
then the example. Don't need to double backtick Series or DataFrame (we're changing our style for those). |
||
|
||
.. ipython:: python | ||
|
||
pd.Series([np.nan]).sum() | ||
|
||
pd.Series([]).sum() | ||
|
||
.. warning:: | ||
|
||
These behaviors differ from the default in ``numpy`` where an empty sum returns zero. | ||
With ``prod`` on an empty or all-``NaN`` ``Series``, or columns of a ``DataFrame``, the result will be 1. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Similar
|
||
|
||
.. ipython:: python | ||
|
||
np.nansum(np.array([np.nan])) | ||
np.nansum(np.array([])) | ||
.. ipython:: python | ||
|
||
pd.Series([np.nan]).prod() | ||
|
||
pd.Series([]).prod() | ||
|
||
|
||
NA values in GroupBy | ||
|
@@ -242,7 +228,7 @@ with missing data. | |
Filling missing values: fillna | ||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
|
||
The **fillna** function can "fill in" NA values with non-NA data in a couple | ||
The :meth:`~DataFrame.fillna` function can "fill in" NA values with non-NA data in a couple | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Instead of "The :meth:`method` function", write it as just ":meth:`method`". From the context it's clear that it's a function / method. |
||
of ways, which we illustrate: | ||
|
||
**Replace NA with a scalar value** | ||
|
@@ -292,8 +278,8 @@ To remind you, these are the available filling methods: | |
With time series data, using pad/ffill is extremely common so that the "last | ||
known value" is available at every time point. | ||
|
||
The ``ffill()`` function is equivalent to ``fillna(method='ffill')`` | ||
and ``bfill()`` is equivalent to ``fillna(method='bfill')`` | ||
The :meth:`~DataFrame.ffill` function is equivalent to ``fillna(method='ffill')`` | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Same thing, remove "The" and "function" |
||
and :meth:`~DataFrame.bfill` is equivalent to ``fillna(method='bfill')`` | ||
|
||
.. _missing_data.PandasObject: | ||
|
||
|
@@ -486,7 +472,7 @@ at the new values. | |
Interpolation Limits | ||
^^^^^^^^^^^^^^^^^^^^ | ||
|
||
Like other pandas fill methods, ``interpolate`` accepts a ``limit`` keyword | ||
Like other pandas fill methods, :meth:`~DataFrame.interpolate` accepts a ``limit`` keyword | ||
argument. Use this argument to limit the number of consecutive ``NaN`` values | ||
filled since the last valid observation: | ||
|
||
|
@@ -534,7 +520,7 @@ the ``limit_area`` parameter restricts filling to either inside or outside value | |
Replacing Generic Values | ||
~~~~~~~~~~~~~~~~~~~~~~~~ | ||
Often times we want to replace arbitrary values with other values. The | ||
``replace`` method in Series/DataFrame provides an efficient yet | ||
:meth:`~DataFrame.replace` method in Series/DataFrame provides an efficient yet | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Remove "method". Or write out :meth:`Series.replace` and :meth:`DataFrame.replace`. |
||
flexible way to perform such replacements. | ||
|
||
For a Series, you can replace a single value or a list of values by another | ||
|
@@ -763,7 +749,7 @@ contains NAs, an exception will be generated: | |
reindexed = s.reindex(list(range(8))).fillna(0) | ||
reindexed[crit] | ||
|
||
However, these can be filled in using **fillna** and it will work fine: | ||
However, these can be filled in using :meth:`~DataFrame.fillna` and it will work fine: | ||
|
||
.. ipython:: python | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to keep some kind of warning that this behaviour recently changed (with a link to the relevant whatsnew docs)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's probably best, given the change is so new.
@pulkitmaloo could you add a small note mentioning that, with a link to the 0.22.0 whatsnew?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I've made the changes. Please review it.