Skip to content

DOC: update the pandas.core.resample.Resampler.fillna docstring #20379

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Mar 17, 2018
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
124 changes: 120 additions & 4 deletions pandas/core/resample.py
Original file line number Diff line number Diff line change
Expand Up @@ -624,18 +624,134 @@ def backfill(self, limit=None):

def fillna(self, method, limit=None):
"""
Fill missing values
Fill the new missing values in the resampled data using different
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be a single line. Does

Fill missing values introduced by upsampling.

sound good?

methods.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you try to get this on a single line?


In statistics, imputation is the process of replacing missing data with
substituted values [1]_. When resampling data, missing values may
appear (e.g., when the resampling frequency is higher than the original
frequency).

The backward fill ('bfill') will replace NaN values that appeared in
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aside from the last sentence, this can be folded into the Parameters section.

the resampled data with the next value in the original sequence. The
forward fill ('ffill'), on the other hand, will replace NaN values
that appeared in the resampled data with the previous value in the
original sequence. Missing values that existed in the orginal data will
not be modified.

Parameters
----------
method : str, method of resampling ('ffill', 'bfill')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you change the type description to method : {'ffill', 'bfill'} ? ("method of resampling" belongs on the next line, and is already there)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

method : {'ffill', 'pad', 'bfill', 'backfill', 'nearst'}

Note that ffilll is an alias for pad and bfill is an alias for backfill.

Can you check that 'nearest' works as expected?

And move the descriptions from above here.

Method to use for filling holes in resampled data
* ffill: use previous valid observation to fill gap (forward
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No indentation is needed here (compared to the "Method ..." on the line above), but, sphinx needs a blank line between those two lines

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quote these, since they're strings.

fill).
* bfill: use next valid observation to fill gap (backward
fill).
limit : integer, optional
limit of how many values to fill
Limit of how many values to fill.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Say consecutive values to fill


Returns
-------
Series, DataFrame
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think Series or DataFrame, can't recall.

An upsampled Series or DataFrame with backward or forwards filled
NaN values.

Examples
--------

Resampling a Series:

>>> s = pd.Series([1, 2, 3],
... index=pd.date_range('20180101', periods=3, freq='h'))
>>> s
2018-01-01 00:00:00 1
2018-01-01 01:00:00 2
2018-01-01 02:00:00 3
Freq: H, dtype: int64

>>> s.resample('30min').fillna("bfill")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would maybe first show what it does without filling (which is s.resample().asfreq()), of course this is another method, but it will then be easier to see which values have actually been filled by fillna()

2018-01-01 00:00:00 1
2018-01-01 00:30:00 2
2018-01-01 01:00:00 2
2018-01-01 01:30:00 3
2018-01-01 02:00:00 3
Freq: 30T, dtype: int64

>>> s.resample('15min').fillna("bfill", limit=2)
2018-01-01 00:00:00 1.0
2018-01-01 00:15:00 NaN
2018-01-01 00:30:00 2.0
2018-01-01 00:45:00 2.0
2018-01-01 01:00:00 2.0
2018-01-01 01:15:00 NaN
2018-01-01 01:30:00 3.0
2018-01-01 01:45:00 3.0
2018-01-01 02:00:00 3.0
Freq: 15T, dtype: float64

>>> s.resample('30min').fillna("ffill")
2018-01-01 00:00:00 1
2018-01-01 00:30:00 1
2018-01-01 01:00:00 2
2018-01-01 01:30:00 2
2018-01-01 02:00:00 3
Freq: 30T, dtype: int64

Resampling a DataFrame that has missing values:

>>> df = pd.DataFrame({'a': [2, np.nan, 6], 'b': [1, 3, 5]},
... index=pd.date_range('20180101', periods=3,
... freq='h'))
>>> df
a b
2018-01-01 00:00:00 2.0 1
2018-01-01 01:00:00 NaN 3
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would add such an example with a missing value above for Series as well (or instead of this example).
I think using a Series will make it easier to understand and easier to focus on that specific behaviour.

In the end, we can limit the number of examples for DataFrame and basically say that for a DataFrame everything works similar as for Series column-by-column

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

2018-01-01 02:00:00 6.0 5

>>> df.resample('30min').fillna("bfill")
a b
2018-01-01 00:00:00 2.0 1
2018-01-01 00:30:00 NaN 3
2018-01-01 01:00:00 NaN 3
2018-01-01 01:30:00 6.0 5
2018-01-01 02:00:00 6.0 5

>>> df.resample('15min').fillna("bfill", limit=2)
a b
2018-01-01 00:00:00 2.0 1.0
2018-01-01 00:15:00 NaN NaN
2018-01-01 00:30:00 NaN 3.0
2018-01-01 00:45:00 NaN 3.0
2018-01-01 01:00:00 NaN 3.0
2018-01-01 01:15:00 NaN NaN
2018-01-01 01:30:00 6.0 5.0
2018-01-01 01:45:00 6.0 5.0
2018-01-01 02:00:00 6.0 5.0

>>> df.resample('30min').fillna("ffill")
a b
2018-01-01 00:00:00 2.0 1
2018-01-01 00:30:00 2.0 1
2018-01-01 01:00:00 NaN 3
2018-01-01 01:30:00 NaN 3
2018-01-01 02:00:00 6.0 5

See Also
--------
Series.fillna
DataFrame.fillna
backfill : Backward fill NaN values in the resampled data.
pad : Forward fill NaN values in the resampled data.
bfill : Alias of backfill.
ffill: Alias of pad.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can remove these aliases I think, since they go to the same page.

nearest : Fill NaN values in the resampled data
with nearest neighbor starting from center.
pandas.Series.fillna : Fill NaN values in the Series using the
specified method, which can be 'bfill' and 'ffill'.
pandas.DataFrame.fillna : Fill NaN values in the DataFrame using the
specified method, which can be 'bfill' and 'ffill'.

References
----------
.. [1] https://en.wikipedia.org/wiki/Imputation_(statistics)
"""
return self._upsample(method, limit=limit)

Expand Down