Skip to content

DOC: update the pandas.core.resample.Resampler.backfill docstring #20083

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Mar 12, 2018
Merged

DOC: update the pandas.core.resample.Resampler.backfill docstring #20083

merged 5 commits into from
Mar 12, 2018

Conversation

gcbeltramini
Copy link
Contributor

@gcbeltramini gcbeltramini commented Mar 9, 2018

Checklist for the pandas documentation sprint:

  • PR title is "DOC: update the docstring"
  • The validation script passes: scripts/validate_docstrings.py pandas.core.resample.Resampler.backfill
  • The PEP8 style check passes: git diff upstream/master -u -- "*.py" | flake8 --diff
  • The html version looks good: python make.py --single pandas.core.resample.Resampler.backfill
    (after the modification suggested here)
  • It has been proofread on language by another sprint participant

Please include the output of the validation script below between the "```" ticks:


################################################################################
############# Docstring (pandas.core.resample.Resampler.backfill)  #############
################################################################################

Backward fill the new missing values in the resampled data.

In statistics, imputation is the process of replacing missing data with
substituted values [1]_. When resampling data, missing values may
appear (e.g., when the resampling frequency is higher than the original
frequency). The backward fill will replace NaN values that appeared in
the resampled data with the next value in the original sequence.
Missing values that existed in the orginal data will not be modified.

Parameters
----------
limit : integer, optional
    Limit of how many values to fill.

Returns
-------
Series, DataFrame
    An upsampled Series or DataFrame with backward filled NaN values.

See Also
--------
bfill : Alias of backfill.
fillna : Fill NaN values using the specified method, which can be
    'backfill'.
nearest : Fill NaN values with nearest neighbor starting from center.
pad : Forward fill NaN values.
pandas.Series.fillna : Fill NaN values in the Series using the
    specified method, which can be 'backfill'.
pandas.DataFrame.fillna : Fill NaN values in the DataFrame using the
    specified method, which can be 'backfill'.

References
----------
.. [1] https://en.wikipedia.org/wiki/Imputation_(statistics)

Examples
--------

Resampling a Series:

>>> s = pd.Series([1, 2, 3],
...               index=pd.date_range('20180101', periods=3, freq='h'))
>>> s
2018-01-01 00:00:00    1
2018-01-01 01:00:00    2
2018-01-01 02:00:00    3
Freq: H, dtype: int64

>>> s.resample('30min').backfill()
2018-01-01 00:00:00    1
2018-01-01 00:30:00    2
2018-01-01 01:00:00    2
2018-01-01 01:30:00    3
2018-01-01 02:00:00    3
Freq: 30T, dtype: int64

>>> s.resample('15min').backfill(limit=2)
2018-01-01 00:00:00    1.0
2018-01-01 00:15:00    NaN
2018-01-01 00:30:00    2.0
2018-01-01 00:45:00    2.0
2018-01-01 01:00:00    2.0
2018-01-01 01:15:00    NaN
2018-01-01 01:30:00    3.0
2018-01-01 01:45:00    3.0
2018-01-01 02:00:00    3.0
Freq: 15T, dtype: float64

Resampling a DataFrame that has missing values:

>>> df = pd.DataFrame({'a': [2, np.nan, 6], 'b': [1, 3, 5]},
...                   index=pd.date_range('20180101', periods=3,
...                                       freq='h'))
>>> df
                       a  b
2018-01-01 00:00:00  2.0  1
2018-01-01 01:00:00  NaN  3
2018-01-01 02:00:00  6.0  5

>>> df.resample('30min').backfill()
                       a  b
2018-01-01 00:00:00  2.0  1
2018-01-01 00:30:00  NaN  3
2018-01-01 01:00:00  NaN  3
2018-01-01 01:30:00  6.0  5
2018-01-01 02:00:00  6.0  5

>>> df.resample('15min').backfill(limit=2)
                       a    b
2018-01-01 00:00:00  2.0  1.0
2018-01-01 00:15:00  NaN  NaN
2018-01-01 00:30:00  NaN  3.0
2018-01-01 00:45:00  NaN  3.0
2018-01-01 01:00:00  NaN  3.0
2018-01-01 01:15:00  NaN  NaN
2018-01-01 01:30:00  6.0  5.0
2018-01-01 01:45:00  6.0  5.0
2018-01-01 02:00:00  6.0  5.0

################################################################################
################################## Validation ##################################
################################################################################

Docstring for "pandas.core.resample.Resampler.backfill" correct. :)

@gcbeltramini
Copy link
Contributor Author

When I run python make.py --single pandas.core.resample.Resampler.backfill, I get the following error message:

Traceback (most recent call last):
  File "make.py", line 138, in _process_single_doc
    obj = getattr(obj, name)
AttributeError: module 'pandas.core' has no attribute 'resample'

I tested with other modules inside pandas.core, e.g., python make.py --single pandas.core.groupby, and the HTML file is generated.

@TomAugspurger
Copy link
Contributor

TomAugspurger commented Mar 9, 2018

When I run python make.py --single pandas.core.resample.Resampler.backfill, I get the following error message

cc @datapythonista, @jorisvandenbossche that's because pd.core.resample isn't imported in pandas/__init__.py. We use importlib to import pandas, but we should also import objects that are documented but not imported by default (Styler?, others?)

@gcbeltramini this should fix it for you for testing, but I think we'll handle all these in a separate PR.

diff --git a/doc/make.py b/doc/make.py
index 4967f3045..f1b63cf13 100755
--- a/doc/make.py
+++ b/doc/make.py
@@ -349,6 +349,7 @@ def main():
     os.environ['PYTHONPATH'] = args.python_path
     sys.path.append(args.python_path)
     globals()['pandas'] = importlib.import_module('pandas')
+    importlib.import_module('pandas.core.resample')
 
     builder = DocBuilder(args.num_jobs, not args.no_api, args.single,
                          args.verbosity)

@codecov
Copy link

codecov bot commented Mar 9, 2018

Codecov Report

Merging #20083 into master will increase coverage by <.01%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #20083      +/-   ##
==========================================
+ Coverage   91.72%   91.72%   +<.01%     
==========================================
  Files         150      150              
  Lines       49122    49152      +30     
==========================================
+ Hits        45057    45086      +29     
- Misses       4065     4066       +1
Flag Coverage Δ
#multiple 90.11% <ø> (ø) ⬆️
#single 41.84% <ø> (-0.02%) ⬇️
Impacted Files Coverage Δ
pandas/core/resample.py 96.43% <ø> (ø) ⬆️
pandas/core/base.py 96.78% <0%> (-0.02%) ⬇️
pandas/core/indexes/datetimes.py 95.64% <0%> (-0.01%) ⬇️
pandas/core/series.py 93.85% <0%> (-0.01%) ⬇️
pandas/core/groupby.py 92.14% <0%> (-0.01%) ⬇️
pandas/core/indexes/base.py 96.66% <0%> (-0.01%) ⬇️
pandas/core/generic.py 95.84% <0%> (ø) ⬆️
pandas/core/indexes/timedeltas.py 91.03% <0%> (ø) ⬆️
pandas/core/indexes/multi.py 95.06% <0%> (ø) ⬆️
pandas/core/strings.py 98.32% <0%> (ø) ⬆️
... and 3 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7c14e4f...4e616f7. Read the comment docs.

@gcbeltramini
Copy link
Contributor Author

Thanks @TomAugspurger! I edited the file doc/make.py according to your suggestion, and I could see the HTML file.

Copy link
Contributor

@joaoavf joaoavf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very good overall, I found some minor suggestions.

@@ -519,21 +519,55 @@ def nearest(self, limit=None):

def backfill(self, limit=None):
"""
Backward fill the values
Backward fill the values.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be interesting to add a more thorough explanation of what exactly is a backward fill for novice users who have never seem this term. Something along the lines of: 'get all the NA values and substitute with the value on the next row that has a non-NA value'

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you can find a wikipedia reference would be great as well.


Parameters
----------
limit : integer, optional
limit of how many values to fill
Limit of how many values to fill.

Returns
-------
an upsampled Series
Copy link
Contributor

@joaoavf joaoavf Mar 10, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've seen some examples on https://python-sprints.github.io/pandas/guide/pandas_docstring.html, which would be along the lines of:

Returns

Series
An upsampled Series with backward filled NA values

@@ -519,21 +519,55 @@ def nearest(self, limit=None):

def backfill(self, limit=None):
"""
Backward fill the values
Backward fill the values.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you can find a wikipedia reference would be great as well.

@jreback jreback added Docs Resample resample method labels Mar 10, 2018
@jreback jreback added this to the 0.23.0 milestone Mar 10, 2018
@jreback jreback added the Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate label Mar 10, 2018

See Also
--------
Series.fillna
DataFrame.fillna
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a Resampler.pad, nearest, and fillna refs @jorisvandenbossche how do we reference these exactly here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be you can simply refer to them as 'pad', 'nearest', .. because they live on the same class. But need to check.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I run python make.py --single pandas.core.resample.Resampler.backfill, the links are never created. But if I run python make.py html, the links are created correctly if I use: pandas.Series.fillna (Series.fillna and pandas.core.series.Series.fillna don't work) and fillna or Resampler.fillna.

Copy link
Member

@datapythonista datapythonista left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Didn't do that much ts resampling myself, and I learned a lot from these changes. Great piece of documentation.

@TomAugspurger TomAugspurger merged commit 1cecfdf into pandas-dev:master Mar 12, 2018
@TomAugspurger
Copy link
Contributor

Thanks all. @gcbeltramini if you could check http://pandas-docs.github.io/pandas-docs-travis/generated/pandas.core.resample.Resampler.backfill.html when https://travis-ci.org/pandas-dev/pandas/jobs/352298397 finishes to see if the links render correctly, we'd appreciate it.

@jreback
Copy link
Contributor

jreback commented Mar 12, 2018

note @TomAugspurger the docs are not actually built because we are not changing any rst files. I just forced a doc-build (though this one is not included as was after). we could just disable this check (in build_docs) for say this week

@TomAugspurger
Copy link
Contributor

TomAugspurger commented Mar 12, 2018 via email

@jorisvandenbossche
Copy link
Member

I changed this recently, docs are now always built (exactly for this reason, we always want to build the docs)

@jorisvandenbossche
Copy link
Member

Ah, no, I changed it in a PR that is not yet merged .. :)

@jreback
Copy link
Contributor

jreback commented Mar 12, 2018

ahh ok, yeah, go ahead and merge that then :>

@gcbeltramini gcbeltramini deleted the resample-bfill branch March 12, 2018 15:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Docs Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Resample resample method
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants