You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: doc/source/missing_data.rst
+26-34
Original file line number
Diff line number
Diff line change
@@ -75,7 +75,7 @@ arise and we wish to also consider that "missing" or "not available" or "NA".
75
75
To make detecting missing values easier (and across different array dtypes),
76
76
pandas provides the :func:`isna` and
77
77
:func:`notna` functions, which are also methods on
78
-
``Series`` and ``DataFrame`` objects:
78
+
Series and DataFrame objects:
79
79
80
80
.. ipython:: python
81
81
@@ -170,16 +170,16 @@ The descriptive statistics and computational methods discussed in the
170
170
account for missing data. For example:
171
171
172
172
* When summing data, NA (missing) values will be treated as zero.
173
-
* If the data are all NA, the result will be NA.
174
-
* Methods like **cumsum** and **cumprod** ignore NA values, but preserve them
175
-
in the resulting arrays.
173
+
* If the data are all NA, the result will be 0.
174
+
* Cumulative methods like :meth:`~DataFrame.cumsum` and :meth:`~DataFrame.cumprod` ignore NA values by default, but preserve them in the resulting arrays. To override this behaviour and include NA values, use ``skipna=False``.
176
175
177
176
.. ipython:: python
178
177
179
178
df
180
179
df['one'].sum()
181
180
df.mean(1)
182
181
df.cumsum()
182
+
df.cumsum(skipna=False)
183
183
184
184
185
185
.. _missing_data.numeric_sum:
@@ -189,33 +189,24 @@ Sum/Prod of Empties/Nans
189
189
190
190
.. warning::
191
191
192
-
This behavior is now standard as of v0.21.0; previously sum/prod would give different
193
-
results if the ``bottleneck`` package was installed.
194
-
See the :ref:`v0.21.0 whatsnew <whatsnew_0210.api_breaking.bottleneck>`.
192
+
This behavior is now standard as of v0.22.0 and is consistent with the default in ``numpy``; previously sum/prod of all-NA or empty Series/DataFrames would return NaN.
193
+
See :ref:`v0.22.0 whatsnew <whatsnew_0220>` for more.
195
194
196
-
With ``sum`` or ``prod`` on an empty or all-``NaN`` ``Series``, or columns of a ``DataFrame``, the result will be all-``NaN``.
197
-
198
-
.. ipython:: python
199
-
200
-
s = pd.Series([np.nan])
201
-
202
-
s.sum()
203
-
204
-
Summing over an empty ``Series`` will return ``NaN``:
195
+
The sum of an empty or all-NA Series or column of a DataFrame is 0.
205
196
206
197
.. ipython:: python
207
198
199
+
pd.Series([np.nan]).sum()
200
+
208
201
pd.Series([]).sum()
209
202
210
-
.. warning::
203
+
The product of an empty or all-NA Series or column of a DataFrame is 1.
211
204
212
-
These behaviors differ from the default in ``numpy`` where an empty sum returns zero.
213
-
214
-
.. ipython:: python
215
-
216
-
np.nansum(np.array([np.nan]))
217
-
np.nansum(np.array([]))
205
+
.. ipython:: python
218
206
207
+
pd.Series([np.nan]).prod()
208
+
209
+
pd.Series([]).prod()
219
210
220
211
221
212
NA values in GroupBy
@@ -242,7 +233,7 @@ with missing data.
242
233
Filling missing values: fillna
243
234
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
244
235
245
-
The **fillna** function can "fill in" NA values with non-NA data in a couple
236
+
:meth:`~DataFrame.fillna` can "fill in" NA values with non-NA data in a couple
246
237
of ways, which we illustrate:
247
238
248
239
**Replace NA with a scalar value**
@@ -292,8 +283,8 @@ To remind you, these are the available filling methods:
292
283
With time series data, using pad/ffill is extremely common so that the "last
293
284
known value" is available at every time point.
294
285
295
-
The ``ffill()`` function is equivalent to ``fillna(method='ffill')``
296
-
and ``bfill()`` is equivalent to ``fillna(method='bfill')``
286
+
:meth:`~DataFrame.ffill` is equivalent to ``fillna(method='ffill')``
287
+
and :meth:`~DataFrame.bfill` is equivalent to ``fillna(method='bfill')``
0 commit comments