Skip to content

ENH: add fill_value to resample #14591

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

nchmura4
Copy link
Contributor

@nchmura4 nchmura4 commented Nov 5, 2016

@nchmura4 nchmura4 changed the title Nchmura resample fill value ENH: add fill_value to resample Nov 5, 2016
@sinhrks sinhrks added Enhancement Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Resample resample method labels Nov 7, 2016
@nchmura4 nchmura4 closed this Nov 9, 2016
@nchmura4 nchmura4 reopened this Nov 9, 2016
@max-sixty
Copy link
Contributor

If I'm understanding this correctly, this is equivalent to .fillna prior to .resample. Another interpretation of the kwarg is that it fills a value while upsampling.

If that's right, is this worth a kwarg? Why not just leave it to the user to fillna prior to resampling?

@nchmura4
Copy link
Contributor Author

I think that's a fair question. This is how i understood 3755. An argument for this feature is that you have a fill_value kwarg in reindex, whereas that's equivalent to just running reindex(...).fillna(), right?

@max-sixty
Copy link
Contributor

That kwarg reindex only fills empty values created on reindex. (although I also recall that behavior not being consistent):

In [3]: series = pd.Series(range(4))

In [4]: series[2] = pd.np.nan

In [5]: series
Out[5]:
0    0.0
1    1.0
2    NaN
3    3.0
dtype: float64

In [6]: series.reindex(range(6), fill_value=9)
Out[6]:
0    0.0
1    1.0
2    NaN
3    3.0
4    9.0
5    9.0
dtype: float64

@nchmura4
Copy link
Contributor Author

i see. got it, thanks.

@jorisvandenbossche
Copy link
Member

@nchmura4 As Maximilian said, I also think that a possible fill_value keyword should be for filling NaNs created during resampling, not to fill NaNs before resample (as IMO a keyword for that should not be worth it as df.fillna(..).resample(..) is just much more explicit that is does that).
Sorry if we weren't clear about that on the issue!

@nchmura4
Copy link
Contributor Author

no problem, thanks for the feedback. Would this example illustrate the desired behavior?

rng = pd.date_range('1/1/2016', periods=5, freq='4s')
ts = pd.Series(np.arange(len(rng)), index=rng)
ts2 = pd.Series([1,2,None,4,5], index=rng)
df = pd.DataFrame({'one': ts, 'two': ts2})
df

                    one two
2016-01-01 00:00:00 0   1.0
2016-01-01 00:00:04 1   2.0
2016-01-01 00:00:08 2   NaN
2016-01-01 00:00:12 3   4.0
2016-01-01 00:00:16 4   5.0

upsampling with fill_value doesn't fill any NaN's already in the dataframe, but does fill new rows during resampling:
df.resample("2s", fill_value=9).asfreq()

                    one two
2016-01-01 00:00:00 0.0 1.0
2016-01-01 00:00:02 9.0 9.0
2016-01-01 00:00:04 1.0 2.0
2016-01-01 00:00:06 9.0 9.0
2016-01-01 00:00:08 2.0 NaN
2016-01-01 00:00:10 9.0 9.0
2016-01-01 00:00:12 3.0 4.0
2016-01-01 00:00:14 9.0 9.0
2016-01-01 00:00:16 4.0 5.0

@jorisvandenbossche
Copy link
Member

@nchmura4 yes, I think that is a correct example.

Question is here maybe: where does this keyword belong? In resample, or in asfreq ? Or to put it in other words: does such a possible fill_value keyword make sense for all methods on a resample, or only for some?

@nchmura4
Copy link
Contributor Author

I think this only makes sense on unsampling. I can't think of a use case for downsampling. So then I think it belongs in asfreq(). I'm working on that.

@nchmura4 nchmura4 force-pushed the nchmura-resample-fill-value branch from 58d72bc to 649a56b Compare November 23, 2016 12:57
nchmura4 added a commit to nchmura4/pandas that referenced this pull request Nov 23, 2016
nchmura4 added a commit to nchmura4/pandas that referenced this pull request Nov 23, 2016
nchmura4 added a commit to nchmura4/pandas that referenced this pull request Nov 23, 2016
@codecov-io
Copy link

codecov-io commented Nov 23, 2016

Current coverage is 85.55% (diff: 100%)

Merging #14591 into master will not change coverage

@@             master     #14591   diff @@
==========================================
  Files           145        145          
  Lines         51352      51352          
  Methods           0          0          
  Messages          0          0          
  Branches          0          0          
==========================================
  Hits          43932      43932          
  Misses         7420       7420          
  Partials          0          0          

Powered by Codecov. Last update 362e78d...45a9e8e

@@ -31,7 +31,7 @@ Other enhancements
^^^^^^^^^^^^^^^^^^

- ``pd.read_excel`` now preserves sheet order when using ``sheetname=None`` (:issue:`9930`)

- ``DataFrame.asfreq()`` now accepts a fill_value option to fill missing values during resampling (:issue:`3715`).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

double-back ticks around fill_value

fill_value: scalar, optional
value to use for missing values, applied during upsampling

.. version added:: 0.20.0

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add some Examples here

@@ -361,6 +361,23 @@ def test_fillna(self):
with self.assertRaises(ValueError):
r.fillna(0)

def test_fill_value(self):
# setup
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add the issue number as a comment, 1-liner expl

# setup
rng = pd.date_range('1/1/2016', periods=10, freq='2S')
ts = pd.Series(np.arange(len(rng)), index=rng)
ts2 = pd.Series(np.random.rand(len(rng)), index=rng)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a test also to where DataFarme.asfreq is tested (prob not very much :<)

{lamb} grin ".asfreq" pandas\tests\
pandas\tests\frame\test_timeseries.py:
  299 :     def test_asfreq(self):
  300 :         offset_monthly = self.tsframe.asfreq(offsets.BMonthEnd())
  301 :         rule_monthly = self.tsframe.asfreq('BM')
  305 :         filled = rule_monthly.asfreq('B', method='pad')  # noqa
  309 :         filled_dep = rule_monthly.asfreq('B', method='pad')  # noqa
  313 :         result = zero_length.asfreq('BM')
  316 :     def test_asfreq_datetimeindex(self):
  320 :         df = df.asfreq('B')
  323 :         ts = df['A'].asfreq('B')
pandas\tests\indexes\test_datetimelike.py:
  830 :                 idx.get_loc(idx[1].asfreq('H', how='start'), method), 1)
  888 :         idx = pd.period_range('2000-01-01', periods=3).asfreq('H', how='start')
pandas\tests\plotting\test_datetimelike.py:
  319 :         bts = tm.makeTimeSeries().asfreq('BM')
  740 :         ts2 = ts.asfreq('T').interpolate()
pandas\tests\series\test_datetime_values.py:
   37 :         ok_for_period_methods = ['strftime', 'to_timestamp', 'asfreq']
pandas\tests\series\test_timeseries.py:
  407 :     def test_asfreq(self):
  411 :         daily_ts = ts.asfreq('B')
  412 :         monthly_ts = daily_ts.asfreq('BM')
  415 :         daily_ts = ts.asfreq('B', method='pad')
  416 :         monthly_ts = daily_ts.asfreq('BM')
  419 :         daily_ts = ts.asfreq(BDay())
  420 :         monthly_ts = daily_ts.asfreq(BMonthEnd())
  423 :         result = ts[:0].asfreq('M')

and pls add a tests for series as well

expected_df = df.resample('1S').asfreq().fillna(9.0)
expected_df.loc['2016-01-01 00:00:08', 'two'] = None

assert_frame_equal(expected_df, actual_df)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add for series as well

@jreback
Copy link
Contributor

jreback commented Nov 23, 2016

@nchmura4 looks pretty good.

@nchmura4
Copy link
Contributor Author

thanks - I added documentation and tests you suggested


Returns
-------
converted : type of caller

Examples
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice!

@@ -4003,18 +4004,62 @@ def asfreq(self, freq, method=None, how=None, normalize=False):
For PeriodIndex only, see PeriodIndex.asfreq
normalize : bool, default False
Whether to reset output index to midnight
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a sentence or 2 on the use / diffs of .asfreq() / resample.

2000-01-01 00:01:30 9.0
2000-01-01 00:02:00 2.0
2000-01-01 00:02:30 9.0
2000-01-01 00:03:00 3.0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really nice example, thanks!
Can you add the output using the existing method keyword (also for filling NaNs) with this example as well? (pick one of the possible values for the keyword)

@@ -4003,18 +4004,62 @@ def asfreq(self, freq, method=None, how=None, normalize=False):
For PeriodIndex only, see PeriodIndex.asfreq
normalize : bool, default False
Whether to reset output index to midnight
fill_value: scalar, optional
value to use for missing values, applied during upsampling
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a similar sentence as for method: "this does not fill NaNs that were already present"?
(can you also capitalize the first word of the sentence?

@nchmura4
Copy link
Contributor Author

thanks for the feedback

@nchmura4
Copy link
Contributor Author

nchmura4 commented Dec 2, 2016

sorry, i think i accidentally messed this pull request up :/ was trying to pull new changes locally and accidentally pushed them into this PR. Can someone help me? sorry!

@jorisvandenbossche
Copy link
Member

@nchmura4 If you follow those steps, it should be solved:

git fetch upstream
git checkout nchmura-resample-fill-value
git rebase upstream/master
git push -f origin nchmura-resample-fill-value

@nchmura4 nchmura4 force-pushed the nchmura-resample-fill-value branch from c8cd7f2 to aab4497 Compare December 3, 2016 18:04
nchmura4 added a commit to nchmura4/pandas that referenced this pull request Dec 3, 2016
nchmura4 added a commit to nchmura4/pandas that referenced this pull request Dec 3, 2016
nchmura4 added a commit to nchmura4/pandas that referenced this pull request Dec 3, 2016
@nchmura4 nchmura4 force-pushed the nchmura-resample-fill-value branch from aab4497 to 8f9097e Compare December 3, 2016 18:37
nchmura4 added a commit to nchmura4/pandas that referenced this pull request Dec 3, 2016
nchmura4 added a commit to nchmura4/pandas that referenced this pull request Dec 3, 2016
nchmura4 added a commit to nchmura4/pandas that referenced this pull request Dec 3, 2016
@nchmura4
Copy link
Contributor Author

nchmura4 commented Dec 3, 2016

@jorisvandenbossche thank you! i see what i did wrong. PR all up to date now :)

2000-01-01 00:02:00 2.0
2000-01-01 00:02:30 3.0
2000-01-01 00:03:00 3.0

Notes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a See Also to .reindex (which has fill_value)

this does not fill NaNs that already were present).

.. versionadded:: 0.20.0

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a See Also to Series.asfreq/DataFrame.asfreq

@@ -361,6 +361,29 @@ def test_fillna(self):
with self.assertRaises(ValueError):
r.fillna(0)

def test_fill_value(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these tests should be in class Base. Which will tests with all index types (you need to tests both Series/DataFrame here)

use self.create_index() to create your index.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jreback I'm a bit stuck on PeriodIndex. With my current commits you can add a fill_value for PeriodIndexResampler._upsample with kind='timestamp' but it doesn't work for kind='period'. I think I have to build that out. Otherwise, this won't work for PeriodIndex.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Upsampling on a PeriodIndex with kind='period' doesn't work through reindex. I think it would involve adding fill_value to Index.get_indexer (line 901 of tseries/resample.py) Before I start going down that path, I just want to confirm that's its the right thing to do.

@nchmura4 nchmura4 force-pushed the nchmura-resample-fill-value branch from 8f9097e to 133175d Compare December 21, 2016 06:21
nchmura4 added a commit to nchmura4/pandas that referenced this pull request Dec 21, 2016
nchmura4 added a commit to nchmura4/pandas that referenced this pull request Dec 21, 2016
nchmura4 added a commit to nchmura4/pandas that referenced this pull request Dec 21, 2016
@jreback
Copy link
Contributor

jreback commented Dec 30, 2016

can you rebase

@nchmura4 nchmura4 force-pushed the nchmura-resample-fill-value branch from 133175d to 36095c6 Compare December 31, 2016 04:40
nchmura4 added a commit to nchmura4/pandas that referenced this pull request Dec 31, 2016
nchmura4 added a commit to nchmura4/pandas that referenced this pull request Dec 31, 2016
nchmura4 added a commit to nchmura4/pandas that referenced this pull request Dec 31, 2016
@nchmura4 nchmura4 force-pushed the nchmura-resample-fill-value branch from 36095c6 to 45a9e8e Compare January 18, 2017 04:11
@nchmura4 nchmura4 force-pushed the nchmura-resample-fill-value branch from 45a9e8e to 08b6b28 Compare January 18, 2017 05:34
@jreback jreback added this to the 0.20.0 milestone Jan 21, 2017
@jreback
Copy link
Contributor

jreback commented Jan 21, 2017

thanks @nchmura4 !

@jreback
Copy link
Contributor

jreback commented Jan 21, 2017

closed by 2540d5a

@jreback jreback closed this Jan 21, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Resample resample method
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ENH: Add fill_value option to resample()
7 participants