-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
ENH: add fill_value to resample #14591
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
If I'm understanding this correctly, this is equivalent to If that's right, is this worth a kwarg? Why not just leave it to the user to |
I think that's a fair question. This is how i understood 3755. An argument for this feature is that you have a fill_value kwarg in reindex, whereas that's equivalent to just running reindex(...).fillna(), right? |
That kwarg In [3]: series = pd.Series(range(4))
In [4]: series[2] = pd.np.nan
In [5]: series
Out[5]:
0 0.0
1 1.0
2 NaN
3 3.0
dtype: float64
In [6]: series.reindex(range(6), fill_value=9)
Out[6]:
0 0.0
1 1.0
2 NaN
3 3.0
4 9.0
5 9.0
dtype: float64 |
i see. got it, thanks. |
@nchmura4 As Maximilian said, I also think that a possible |
no problem, thanks for the feedback. Would this example illustrate the desired behavior?
upsampling with fill_value doesn't fill any NaN's already in the dataframe, but does fill new rows during resampling:
|
@nchmura4 yes, I think that is a correct example. Question is here maybe: where does this keyword belong? In |
I think this only makes sense on unsampling. I can't think of a use case for downsampling. So then I think it belongs in asfreq(). I'm working on that. |
58d72bc
to
649a56b
Compare
Current coverage is 85.55% (diff: 100%)@@ master #14591 diff @@
==========================================
Files 145 145
Lines 51352 51352
Methods 0 0
Messages 0 0
Branches 0 0
==========================================
Hits 43932 43932
Misses 7420 7420
Partials 0 0
|
@@ -31,7 +31,7 @@ Other enhancements | |||
^^^^^^^^^^^^^^^^^^ | |||
|
|||
- ``pd.read_excel`` now preserves sheet order when using ``sheetname=None`` (:issue:`9930`) | |||
|
|||
- ``DataFrame.asfreq()`` now accepts a fill_value option to fill missing values during resampling (:issue:`3715`). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
double-back ticks around fill_value
fill_value: scalar, optional | ||
value to use for missing values, applied during upsampling | ||
|
||
.. version added:: 0.20.0 | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add some Examples here
@@ -361,6 +361,23 @@ def test_fillna(self): | |||
with self.assertRaises(ValueError): | |||
r.fillna(0) | |||
|
|||
def test_fill_value(self): | |||
# setup |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add the issue number as a comment, 1-liner expl
# setup | ||
rng = pd.date_range('1/1/2016', periods=10, freq='2S') | ||
ts = pd.Series(np.arange(len(rng)), index=rng) | ||
ts2 = pd.Series(np.random.rand(len(rng)), index=rng) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add a test also to where DataFarme.asfreq is tested (prob not very much :<)
{lamb} grin ".asfreq" pandas\tests\
pandas\tests\frame\test_timeseries.py:
299 : def test_asfreq(self):
300 : offset_monthly = self.tsframe.asfreq(offsets.BMonthEnd())
301 : rule_monthly = self.tsframe.asfreq('BM')
305 : filled = rule_monthly.asfreq('B', method='pad') # noqa
309 : filled_dep = rule_monthly.asfreq('B', method='pad') # noqa
313 : result = zero_length.asfreq('BM')
316 : def test_asfreq_datetimeindex(self):
320 : df = df.asfreq('B')
323 : ts = df['A'].asfreq('B')
pandas\tests\indexes\test_datetimelike.py:
830 : idx.get_loc(idx[1].asfreq('H', how='start'), method), 1)
888 : idx = pd.period_range('2000-01-01', periods=3).asfreq('H', how='start')
pandas\tests\plotting\test_datetimelike.py:
319 : bts = tm.makeTimeSeries().asfreq('BM')
740 : ts2 = ts.asfreq('T').interpolate()
pandas\tests\series\test_datetime_values.py:
37 : ok_for_period_methods = ['strftime', 'to_timestamp', 'asfreq']
pandas\tests\series\test_timeseries.py:
407 : def test_asfreq(self):
411 : daily_ts = ts.asfreq('B')
412 : monthly_ts = daily_ts.asfreq('BM')
415 : daily_ts = ts.asfreq('B', method='pad')
416 : monthly_ts = daily_ts.asfreq('BM')
419 : daily_ts = ts.asfreq(BDay())
420 : monthly_ts = daily_ts.asfreq(BMonthEnd())
423 : result = ts[:0].asfreq('M')
and pls add a tests for series as well
expected_df = df.resample('1S').asfreq().fillna(9.0) | ||
expected_df.loc['2016-01-01 00:00:08', 'two'] = None | ||
|
||
assert_frame_equal(expected_df, actual_df) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add for series as well
@nchmura4 looks pretty good. |
thanks - I added documentation and tests you suggested |
|
||
Returns | ||
------- | ||
converted : type of caller | ||
|
||
Examples |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice!
@@ -4003,18 +4004,62 @@ def asfreq(self, freq, method=None, how=None, normalize=False): | |||
For PeriodIndex only, see PeriodIndex.asfreq | |||
normalize : bool, default False | |||
Whether to reset output index to midnight |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add a sentence or 2 on the use / diffs of .asfreq() / resample.
2000-01-01 00:01:30 9.0 | ||
2000-01-01 00:02:00 2.0 | ||
2000-01-01 00:02:30 9.0 | ||
2000-01-01 00:03:00 3.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really nice example, thanks!
Can you add the output using the existing method
keyword (also for filling NaNs) with this example as well? (pick one of the possible values for the keyword)
@@ -4003,18 +4004,62 @@ def asfreq(self, freq, method=None, how=None, normalize=False): | |||
For PeriodIndex only, see PeriodIndex.asfreq | |||
normalize : bool, default False | |||
Whether to reset output index to midnight | |||
fill_value: scalar, optional | |||
value to use for missing values, applied during upsampling |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a similar sentence as for method
: "this does not fill NaNs that were already present"?
(can you also capitalize the first word of the sentence?
thanks for the feedback |
sorry, i think i accidentally messed this pull request up :/ was trying to pull new changes locally and accidentally pushed them into this PR. Can someone help me? sorry! |
@nchmura4 If you follow those steps, it should be solved:
|
c8cd7f2
to
aab4497
Compare
aab4497
to
8f9097e
Compare
@jorisvandenbossche thank you! i see what i did wrong. PR all up to date now :) |
2000-01-01 00:02:00 2.0 | ||
2000-01-01 00:02:30 3.0 | ||
2000-01-01 00:03:00 3.0 | ||
|
||
Notes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add a See Also to .reindex
(which has fill_value
)
this does not fill NaNs that already were present). | ||
|
||
.. versionadded:: 0.20.0 | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add a See Also to Series.asfreq/DataFrame.asfreq
@@ -361,6 +361,29 @@ def test_fillna(self): | |||
with self.assertRaises(ValueError): | |||
r.fillna(0) | |||
|
|||
def test_fill_value(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these tests should be in class Base
. Which will tests with all index types (you need to tests both Series/DataFrame here)
use self.create_index()
to create your index.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jreback I'm a bit stuck on PeriodIndex. With my current commits you can add a fill_value for PeriodIndexResampler._upsample with kind='timestamp' but it doesn't work for kind='period'. I think I have to build that out. Otherwise, this won't work for PeriodIndex.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Upsampling on a PeriodIndex with kind='period' doesn't work through reindex. I think it would involve adding fill_value to Index.get_indexer (line 901 of tseries/resample.py) Before I start going down that path, I just want to confirm that's its the right thing to do.
8f9097e
to
133175d
Compare
can you rebase |
133175d
to
36095c6
Compare
36095c6
to
45a9e8e
Compare
45a9e8e
to
08b6b28
Compare
thanks @nchmura4 ! |
closed by 2540d5a |
git diff upstream/master | flake8 --diff