Skip to content

Commit 14889f1

Browse files
pulkitmalooTomAugspurger
authored andcommitted
DOC: Update missing_data.rst (#20424)
1 parent 1bf36b0 commit 14889f1

File tree

1 file changed

+26
-34
lines changed

1 file changed

+26
-34
lines changed

doc/source/missing_data.rst

+26-34
Original file line numberDiff line numberDiff line change
@@ -75,7 +75,7 @@ arise and we wish to also consider that "missing" or "not available" or "NA".
7575
To make detecting missing values easier (and across different array dtypes),
7676
pandas provides the :func:`isna` and
7777
:func:`notna` functions, which are also methods on
78-
``Series`` and ``DataFrame`` objects:
78+
Series and DataFrame objects:
7979

8080
.. ipython:: python
8181
@@ -170,16 +170,16 @@ The descriptive statistics and computational methods discussed in the
170170
account for missing data. For example:
171171

172172
* When summing data, NA (missing) values will be treated as zero.
173-
* If the data are all NA, the result will be NA.
174-
* Methods like **cumsum** and **cumprod** ignore NA values, but preserve them
175-
in the resulting arrays.
173+
* If the data are all NA, the result will be 0.
174+
* Cumulative methods like :meth:`~DataFrame.cumsum` and :meth:`~DataFrame.cumprod` ignore NA values by default, but preserve them in the resulting arrays. To override this behaviour and include NA values, use ``skipna=False``.
176175

177176
.. ipython:: python
178177
179178
df
180179
df['one'].sum()
181180
df.mean(1)
182181
df.cumsum()
182+
df.cumsum(skipna=False)
183183
184184
185185
.. _missing_data.numeric_sum:
@@ -189,33 +189,24 @@ Sum/Prod of Empties/Nans
189189

190190
.. warning::
191191

192-
This behavior is now standard as of v0.21.0; previously sum/prod would give different
193-
results if the ``bottleneck`` package was installed.
194-
See the :ref:`v0.21.0 whatsnew <whatsnew_0210.api_breaking.bottleneck>`.
192+
This behavior is now standard as of v0.22.0 and is consistent with the default in ``numpy``; previously sum/prod of all-NA or empty Series/DataFrames would return NaN.
193+
See :ref:`v0.22.0 whatsnew <whatsnew_0220>` for more.
195194

196-
With ``sum`` or ``prod`` on an empty or all-``NaN`` ``Series``, or columns of a ``DataFrame``, the result will be all-``NaN``.
197-
198-
.. ipython:: python
199-
200-
s = pd.Series([np.nan])
201-
202-
s.sum()
203-
204-
Summing over an empty ``Series`` will return ``NaN``:
195+
The sum of an empty or all-NA Series or column of a DataFrame is 0.
205196

206197
.. ipython:: python
207198
199+
pd.Series([np.nan]).sum()
200+
208201
pd.Series([]).sum()
209202
210-
.. warning::
203+
The product of an empty or all-NA Series or column of a DataFrame is 1.
211204

212-
These behaviors differ from the default in ``numpy`` where an empty sum returns zero.
213-
214-
.. ipython:: python
215-
216-
np.nansum(np.array([np.nan]))
217-
np.nansum(np.array([]))
205+
.. ipython:: python
218206
207+
pd.Series([np.nan]).prod()
208+
209+
pd.Series([]).prod()
219210
220211
221212
NA values in GroupBy
@@ -242,7 +233,7 @@ with missing data.
242233
Filling missing values: fillna
243234
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
244235

245-
The **fillna** function can "fill in" NA values with non-NA data in a couple
236+
:meth:`~DataFrame.fillna` can "fill in" NA values with non-NA data in a couple
246237
of ways, which we illustrate:
247238

248239
**Replace NA with a scalar value**
@@ -292,8 +283,8 @@ To remind you, these are the available filling methods:
292283
With time series data, using pad/ffill is extremely common so that the "last
293284
known value" is available at every time point.
294285

295-
The ``ffill()`` function is equivalent to ``fillna(method='ffill')``
296-
and ``bfill()`` is equivalent to ``fillna(method='bfill')``
286+
:meth:`~DataFrame.ffill` is equivalent to ``fillna(method='ffill')``
287+
and :meth:`~DataFrame.bfill` is equivalent to ``fillna(method='bfill')``
297288

298289
.. _missing_data.PandasObject:
299290

@@ -329,7 +320,7 @@ Dropping axis labels with missing data: dropna
329320
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
330321

331322
You may wish to simply exclude labels from a data set which refer to missing
332-
data. To do this, use the :meth:`~DataFrame.dropna` method:
323+
data. To do this, use :meth:`~DataFrame.dropna`:
333324

334325
.. ipython:: python
335326
:suppress:
@@ -344,7 +335,7 @@ data. To do this, use the :meth:`~DataFrame.dropna` method:
344335
df.dropna(axis=1)
345336
df['one'].dropna()
346337
347-
An equivalent :meth:`~Series.dropna` method is available for Series.
338+
An equivalent :meth:`~Series.dropna` is available for Series.
348339
DataFrame.dropna has considerably more options than Series.dropna, which can be
349340
examined :ref:`in the API <api.dataframe.missing>`.
350341

@@ -357,7 +348,7 @@ Interpolation
357348

358349
The ``limit_area`` keyword argument was added.
359350

360-
Both Series and DataFrame objects have an :meth:`~DataFrame.interpolate` method
351+
Both Series and DataFrame objects have :meth:`~DataFrame.interpolate`
361352
that, by default, performs linear interpolation at missing datapoints.
362353

363354
.. ipython:: python
@@ -486,7 +477,7 @@ at the new values.
486477
Interpolation Limits
487478
^^^^^^^^^^^^^^^^^^^^
488479

489-
Like other pandas fill methods, ``interpolate`` accepts a ``limit`` keyword
480+
Like other pandas fill methods, :meth:`~DataFrame.interpolate` accepts a ``limit`` keyword
490481
argument. Use this argument to limit the number of consecutive ``NaN`` values
491482
filled since the last valid observation:
492483

@@ -533,8 +524,9 @@ the ``limit_area`` parameter restricts filling to either inside or outside value
533524

534525
Replacing Generic Values
535526
~~~~~~~~~~~~~~~~~~~~~~~~
536-
Often times we want to replace arbitrary values with other values. The
537-
``replace`` method in Series/DataFrame provides an efficient yet
527+
Often times we want to replace arbitrary values with other values.
528+
529+
:meth:`~Series.replace` in Series and :meth:`~DataFrame.replace` in DataFrame provides an efficient yet
538530
flexible way to perform such replacements.
539531

540532
For a Series, you can replace a single value or a list of values by another
@@ -674,7 +666,7 @@ want to use a regular expression.
674666
Numeric Replacement
675667
~~~~~~~~~~~~~~~~~~~
676668

677-
The :meth:`~DataFrame.replace` method is similar to :meth:`~DataFrame.fillna`.
669+
:meth:`~DataFrame.replace` is similar to :meth:`~DataFrame.fillna`.
678670

679671
.. ipython:: python
680672
@@ -763,7 +755,7 @@ contains NAs, an exception will be generated:
763755
reindexed = s.reindex(list(range(8))).fillna(0)
764756
reindexed[crit]
765757
766-
However, these can be filled in using **fillna** and it will work fine:
758+
However, these can be filled in using :meth:`~DataFrame.fillna` and it will work fine:
767759

768760
.. ipython:: python
769761

0 commit comments

Comments
 (0)