BUG: Resample upsampling return NaNs #9528

KevinLourd · 2015-02-19T22:57:28Z

Pandas resample bugs when upsampling a time serie with same size splits :

For instance, I have a time serie of size 10:

rng = pd.date_range('20130101',periods=10,freq='T')
ts=pd.Series(np.random.randn(len(rng)), index=rng)

print(ts)

2013-01-01 00:00:00   -1.811999
2013-01-01 00:01:00   -0.890837
2013-01-01 00:02:00   -0.363520
2013-01-01 00:03:00   -0.026245
2013-01-01 00:04:00    1.515072
2013-01-01 00:05:00    0.920129
2013-01-01 00:06:00   -0.125954
2013-01-01 00:07:00    0.588933
2013-01-01 00:08:00   -1.278408
2013-01-01 00:09:00   -0.172525
Freq: T, dtype: float64

When trying to resample in N > 10 parts it doesn't work:

from datetime import timedelta
length = 11
timeSpan = (ts.index[-1]-ts.index[0]+timedelta(minutes=1))
rule = int(timeSpan.total_seconds()/length)
tsNew=ts.resample(str(rule)+"S").mean()

print(tsNew)

2013-01-01 00:00:00    1.845181
2013-01-01 00:00:54         NaN
2013-01-01 00:01:48         NaN
2013-01-01 00:02:42         NaN
2013-01-01 00:03:36         NaN
2013-01-01 00:04:30         NaN
2013-01-01 00:05:24         NaN
2013-01-01 00:06:18         NaN
2013-01-01 00:07:12         NaN
2013-01-01 00:08:06         NaN
2013-01-01 00:09:00   -0.997419
Freq: 54S, dtype: float64

Note: here is my versions:
pd.show_versions()

INSTALLED VERSIONS
------------------
commit: None
python: 3.4.2.final.0
python-bits: 64
OS: Darwin
OS-release: 14.1.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8

pandas: 0.15.2
nose: 1.3.4
Cython: 0.21
numpy: 1.9.1
scipy: 0.15.1
statsmodels: 0.5.0
IPython: 2.3.1
sphinx: 1.2.3
patsy: 0.3.0
dateutil: 2.1
pytz: 2014.9
bottleneck: None
tables: 3.1.1
numexpr: 2.3.1
matplotlib: 1.4.0
openpyxl: 1.8.5
xlrd: 0.9.3
xlwt: None
xlsxwriter: 0.5.7
lxml: 3.4.0
bs4: 4.3.2
html5lib: None
httplib2: None
apiclient: None
rpy2: None
sqlalchemy: 0.9.7
pymysql: 0.6.3.None
psycopg2: None

Thank you for your help

The text was updated successfully, but these errors were encountered:

jreback · 2015-02-19T23:06:29Z

I don't think this is a bug per se, rather a convention / api issue.

IIRC (and i'll have to look further), it is actually reindexing here. (that's why the stamps that match with your original have values, but the others don't).

Doesn't seem very useful though.

In [1]: rng = pd.date_range('20130101',periods=10,freq='T')

In [2]: ts=pd.Series(np.arange(len(rng)), index=rng)

In [8]: ts.resample('54s',how='mean')
Out[8]: 
2013-01-01 00:00:00     0
2013-01-01 00:00:54     1
2013-01-01 00:01:48     2
2013-01-01 00:02:42     3
2013-01-01 00:03:36     4
2013-01-01 00:04:30     5
2013-01-01 00:05:24     6
2013-01-01 00:06:18     7
2013-01-01 00:07:12     8
2013-01-01 00:08:06   NaN
2013-01-01 00:09:00     9
Freq: 54S, dtype: float64

In [9]: ts.resample('54s')
Out[9]: 
2013-01-01 00:00:00     0
2013-01-01 00:00:54   NaN
2013-01-01 00:01:48   NaN
2013-01-01 00:02:42   NaN
2013-01-01 00:03:36   NaN
2013-01-01 00:04:30   NaN
2013-01-01 00:05:24   NaN
2013-01-01 00:06:18   NaN
2013-01-01 00:07:12   NaN
2013-01-01 00:08:06   NaN
2013-01-01 00:09:00     9
Freq: 54S, dtype: float64

jreback · 2015-02-19T23:08:09Z

what would your expectation be for the result using the input of np.arange(len(ts)) ?

KevinLourd · 2015-02-19T23:41:05Z

I would expect the output[8] that you printed (thank you for the how="mean" tip).
However, that is not working, as explained below:

Taking for instance a smaller input set:

rng = pd.date_range('20130101',periods=3,freq='T')
ts=pd.Series(np.arange(len(rng)), index=rng)
print(ts)

2013-01-01 00:00:00    0
2013-01-01 00:01:00    1
2013-01-01 00:02:00    2
Freq: T, dtype: int64

When trying to divide in 5 parts, we have only 4... :

from datetime import timedelta
length = 5
timeSpan = (ts.index[-1]-ts.index[0]+timedelta(minutes=1))
rule = int(timeSpan.total_seconds()/length)
tsNew=ts.resample(str(rule)+"S").mean()
print(tsNew)

2013-01-01 00:00:00     0
2013-01-01 00:00:36     1
2013-01-01 00:01:12   NaN
2013-01-01 00:01:48     2
Freq: 36S, dtype: float64

I would expect an extra line with a 2 or a NaN like this:

2013-01-01 00:02:24     NaN

The example taken by jreback is a particular case, since it is rounded at 00:09:00 minutes, that is why there is the correct number of row that appears

jreback · 2015-02-20T00:12:50Z

So the fill_method argument applies to the filling for upsample (which is odd because its not consistent with other methods).

That said, there are a LOT of options for resample.

In [17]: ts.resample('36s',fill_method='pad',closed='right')
Out[17]: 
2013-01-01 00:00:00    0
2013-01-01 00:00:36    0
2013-01-01 00:01:12    1
2013-01-01 00:01:48    1
2013-01-01 00:02:24    2
Freq: 36S, dtype: int64

jreback · 2015-02-20T00:14:12Z

Just remembered for the first example, this requires upsampling so fill_method applies.

In [21]: ts.resample('54s',fill_method='pad')
Out[21]: 
2013-01-01 00:00:00    0
2013-01-01 00:00:54    0
2013-01-01 00:01:48    1
2013-01-01 00:02:42    2
2013-01-01 00:03:36    3
2013-01-01 00:04:30    4
2013-01-01 00:05:24    5
2013-01-01 00:06:18    6
2013-01-01 00:07:12    7
2013-01-01 00:08:06    8
2013-01-01 00:09:00    9
Freq: 54S, dtype: int64

KevinLourd · 2015-02-20T00:19:34Z

ts.resample('36s',fill_method='pad',closed='right') works fine.
Although there is no rational reason to be obliged to put closed=right since what is expected here is a closed=left...

jreback added Bug API Design labels Feb 19, 2015

jreback mentioned this issue Mar 4, 2015

Resampling uses inconsistent labeling for sub-daily and super-daily frequencies #9586

Closed

mroeschke added the Resample resample method label Nov 2, 2019

mroeschke added Needs Discussion Requires discussion from core team before further action and removed API Design labels Apr 12, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: Resample upsampling return NaNs #9528

BUG: Resample upsampling return NaNs #9528

KevinLourd commented Feb 19, 2015 •

edited by mroeschke

Loading

jreback commented Feb 19, 2015

jreback commented Feb 19, 2015

KevinLourd commented Feb 19, 2015 •

edited by mroeschke

Loading

jreback commented Feb 20, 2015

jreback commented Feb 20, 2015

KevinLourd commented Feb 20, 2015

BUG: Resample upsampling return NaNs #9528

BUG: Resample upsampling return NaNs #9528

Comments

KevinLourd commented Feb 19, 2015 • edited by mroeschke Loading

jreback commented Feb 19, 2015

jreback commented Feb 19, 2015

KevinLourd commented Feb 19, 2015 • edited by mroeschke Loading

jreback commented Feb 20, 2015

jreback commented Feb 20, 2015

KevinLourd commented Feb 20, 2015

KevinLourd commented Feb 19, 2015 •

edited by mroeschke

Loading

KevinLourd commented Feb 19, 2015 •

edited by mroeschke

Loading