From fd45b513909ece0a680f90c382c26a2eb27b566d Mon Sep 17 00:00:00 2001 From: Pulkit Maloo Date: Tue, 20 Mar 2018 14:15:24 -0400 Subject: [PATCH 1/3] Update missing_data.rst --- doc/source/missing_data.rst | 48 +++++++++++++------------------------ 1 file changed, 17 insertions(+), 31 deletions(-) diff --git a/doc/source/missing_data.rst b/doc/source/missing_data.rst index ee0e2c7462f66..965f3e5107a8a 100644 --- a/doc/source/missing_data.rst +++ b/doc/source/missing_data.rst @@ -170,9 +170,8 @@ The descriptive statistics and computational methods discussed in the account for missing data. For example: * When summing data, NA (missing) values will be treated as zero. -* If the data are all NA, the result will be NA. -* Methods like **cumsum** and **cumprod** ignore NA values, but preserve them - in the resulting arrays. +* If the data are all NA, the result will be 0. +* Cumulative methods like :meth:`~DataFrame.cumsum` and :meth:`~DataFrame.cumprod` ignore NA values by default, but preserve them in the resulting arrays. To override this behaviour and include NA values, use ``skipna=False``. .. ipython:: python @@ -180,6 +179,7 @@ account for missing data. For example: df['one'].sum() df.mean(1) df.cumsum() + df.cumsum(skipna=False) .. _missing_data.numeric_sum: @@ -187,35 +187,21 @@ account for missing data. For example: Sum/Prod of Empties/Nans ~~~~~~~~~~~~~~~~~~~~~~~~ -.. warning:: - - This behavior is now standard as of v0.21.0; previously sum/prod would give different - results if the ``bottleneck`` package was installed. - See the :ref:`v0.21.0 whatsnew `. - -With ``sum`` or ``prod`` on an empty or all-``NaN`` ``Series``, or columns of a ``DataFrame``, the result will be all-``NaN``. - -.. ipython:: python - - s = pd.Series([np.nan]) - - s.sum() - -Summing over an empty ``Series`` will return ``NaN``: +With ``sum`` on an empty or all-``NaN`` ``Series``, or columns of a ``DataFrame``, the result will be 0. .. ipython:: python + pd.Series([np.nan]).sum() + pd.Series([]).sum() -.. warning:: - - These behaviors differ from the default in ``numpy`` where an empty sum returns zero. +With ``prod`` on an empty or all-``NaN`` ``Series``, or columns of a ``DataFrame``, the result will be 1. - .. ipython:: python - - np.nansum(np.array([np.nan])) - np.nansum(np.array([])) +.. ipython:: python + pd.Series([np.nan]).prod() + + pd.Series([]).prod() NA values in GroupBy @@ -242,7 +228,7 @@ with missing data. Filling missing values: fillna ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -The **fillna** function can "fill in" NA values with non-NA data in a couple +The :meth:`~DataFrame.fillna` function can "fill in" NA values with non-NA data in a couple of ways, which we illustrate: **Replace NA with a scalar value** @@ -292,8 +278,8 @@ To remind you, these are the available filling methods: With time series data, using pad/ffill is extremely common so that the "last known value" is available at every time point. -The ``ffill()`` function is equivalent to ``fillna(method='ffill')`` -and ``bfill()`` is equivalent to ``fillna(method='bfill')`` +The :meth:`~DataFrame.ffill` function is equivalent to ``fillna(method='ffill')`` +and :meth:`~DataFrame.bfill` is equivalent to ``fillna(method='bfill')`` .. _missing_data.PandasObject: @@ -486,7 +472,7 @@ at the new values. Interpolation Limits ^^^^^^^^^^^^^^^^^^^^ -Like other pandas fill methods, ``interpolate`` accepts a ``limit`` keyword +Like other pandas fill methods, :meth:`~DataFrame.interpolate` accepts a ``limit`` keyword argument. Use this argument to limit the number of consecutive ``NaN`` values filled since the last valid observation: @@ -534,7 +520,7 @@ the ``limit_area`` parameter restricts filling to either inside or outside value Replacing Generic Values ~~~~~~~~~~~~~~~~~~~~~~~~ Often times we want to replace arbitrary values with other values. The -``replace`` method in Series/DataFrame provides an efficient yet +:meth:`~DataFrame.replace` method in Series/DataFrame provides an efficient yet flexible way to perform such replacements. For a Series, you can replace a single value or a list of values by another @@ -763,7 +749,7 @@ contains NAs, an exception will be generated: reindexed = s.reindex(list(range(8))).fillna(0) reindexed[crit] -However, these can be filled in using **fillna** and it will work fine: +However, these can be filled in using :meth:`~DataFrame.fillna` and it will work fine: .. ipython:: python From 377df1053751362a82099fee2c45213e0e9e0115 Mon Sep 17 00:00:00 2001 From: Pulkit Maloo Date: Tue, 20 Mar 2018 19:52:45 -0400 Subject: [PATCH 2/3] Update missing_data.rst --- doc/source/missing_data.rst | 23 ++++++++++++----------- 1 file changed, 12 insertions(+), 11 deletions(-) diff --git a/doc/source/missing_data.rst b/doc/source/missing_data.rst index 965f3e5107a8a..b489466da333e 100644 --- a/doc/source/missing_data.rst +++ b/doc/source/missing_data.rst @@ -75,7 +75,7 @@ arise and we wish to also consider that "missing" or "not available" or "NA". To make detecting missing values easier (and across different array dtypes), pandas provides the :func:`isna` and :func:`notna` functions, which are also methods on -``Series`` and ``DataFrame`` objects: +Series and DataFrame objects: .. ipython:: python @@ -187,7 +187,7 @@ account for missing data. For example: Sum/Prod of Empties/Nans ~~~~~~~~~~~~~~~~~~~~~~~~ -With ``sum`` on an empty or all-``NaN`` ``Series``, or columns of a ``DataFrame``, the result will be 0. +The sum of an empty or all-NA Series or column of a DataFrame is 0. .. ipython:: python @@ -195,7 +195,7 @@ With ``sum`` on an empty or all-``NaN`` ``Series``, or columns of a ``DataFrame` pd.Series([]).sum() -With ``prod`` on an empty or all-``NaN`` ``Series``, or columns of a ``DataFrame``, the result will be 1. +The product of an empty or all-NA Series or column of a DataFrame is 1. .. ipython:: python @@ -228,7 +228,7 @@ with missing data. Filling missing values: fillna ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -The :meth:`~DataFrame.fillna` function can "fill in" NA values with non-NA data in a couple +:meth:`~DataFrame.fillna` can "fill in" NA values with non-NA data in a couple of ways, which we illustrate: **Replace NA with a scalar value** @@ -278,7 +278,7 @@ To remind you, these are the available filling methods: With time series data, using pad/ffill is extremely common so that the "last known value" is available at every time point. -The :meth:`~DataFrame.ffill` function is equivalent to ``fillna(method='ffill')`` +:meth:`~DataFrame.ffill` is equivalent to ``fillna(method='ffill')`` and :meth:`~DataFrame.bfill` is equivalent to ``fillna(method='bfill')`` .. _missing_data.PandasObject: @@ -315,7 +315,7 @@ Dropping axis labels with missing data: dropna ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ You may wish to simply exclude labels from a data set which refer to missing -data. To do this, use the :meth:`~DataFrame.dropna` method: +data. To do this, use :meth:`~DataFrame.dropna`: .. ipython:: python :suppress: @@ -330,7 +330,7 @@ data. To do this, use the :meth:`~DataFrame.dropna` method: df.dropna(axis=1) df['one'].dropna() -An equivalent :meth:`~Series.dropna` method is available for Series. +An equivalent :meth:`~Series.dropna` is available for Series. DataFrame.dropna has considerably more options than Series.dropna, which can be examined :ref:`in the API `. @@ -343,7 +343,7 @@ Interpolation The ``limit_area`` keyword argument was added. -Both Series and DataFrame objects have an :meth:`~DataFrame.interpolate` method +Both Series and DataFrame objects have :meth:`~DataFrame.interpolate` that, by default, performs linear interpolation at missing datapoints. .. ipython:: python @@ -519,8 +519,9 @@ the ``limit_area`` parameter restricts filling to either inside or outside value Replacing Generic Values ~~~~~~~~~~~~~~~~~~~~~~~~ -Often times we want to replace arbitrary values with other values. The -:meth:`~DataFrame.replace` method in Series/DataFrame provides an efficient yet +Often times we want to replace arbitrary values with other values. + +:meth:`~Series.replace` in Series and :meth:`~DataFrame.replace` in DataFrame provides an efficient yet flexible way to perform such replacements. For a Series, you can replace a single value or a list of values by another @@ -660,7 +661,7 @@ want to use a regular expression. Numeric Replacement ~~~~~~~~~~~~~~~~~~~ -The :meth:`~DataFrame.replace` method is similar to :meth:`~DataFrame.fillna`. +:meth:`~DataFrame.replace` is similar to :meth:`~DataFrame.fillna`. .. ipython:: python From cd338bbe4dc0b5cbab1913dbc12133c92ff585ec Mon Sep 17 00:00:00 2001 From: Pulkit Maloo Date: Wed, 28 Mar 2018 13:23:27 -0400 Subject: [PATCH 3/3] Update missing_data.rst --- doc/source/missing_data.rst | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/doc/source/missing_data.rst b/doc/source/missing_data.rst index b489466da333e..3950e4c80749b 100644 --- a/doc/source/missing_data.rst +++ b/doc/source/missing_data.rst @@ -187,6 +187,11 @@ account for missing data. For example: Sum/Prod of Empties/Nans ~~~~~~~~~~~~~~~~~~~~~~~~ +.. warning:: + + This behavior is now standard as of v0.22.0 and is consistent with the default in ``numpy``; previously sum/prod of all-NA or empty Series/DataFrames would return NaN. + See :ref:`v0.22.0 whatsnew ` for more. + The sum of an empty or all-NA Series or column of a DataFrame is 0. .. ipython:: python