Skip to content

BUG: Resample upsampling return NaNs #9528

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
KevinLourd opened this issue Feb 19, 2015 · 6 comments
Open

BUG: Resample upsampling return NaNs #9528

KevinLourd opened this issue Feb 19, 2015 · 6 comments
Labels
Bug Needs Discussion Requires discussion from core team before further action Resample resample method

Comments

@KevinLourd
Copy link

KevinLourd commented Feb 19, 2015

Pandas resample bugs when upsampling a time serie with same size splits :

For instance, I have a time serie of size 10:

rng = pd.date_range('20130101',periods=10,freq='T')
ts=pd.Series(np.random.randn(len(rng)), index=rng)

print(ts)

2013-01-01 00:00:00   -1.811999
2013-01-01 00:01:00   -0.890837
2013-01-01 00:02:00   -0.363520
2013-01-01 00:03:00   -0.026245
2013-01-01 00:04:00    1.515072
2013-01-01 00:05:00    0.920129
2013-01-01 00:06:00   -0.125954
2013-01-01 00:07:00    0.588933
2013-01-01 00:08:00   -1.278408
2013-01-01 00:09:00   -0.172525
Freq: T, dtype: float64

When trying to resample in N > 10 parts it doesn't work:

from datetime import timedelta
length = 11
timeSpan = (ts.index[-1]-ts.index[0]+timedelta(minutes=1))
rule = int(timeSpan.total_seconds()/length)
tsNew=ts.resample(str(rule)+"S").mean()

print(tsNew)

2013-01-01 00:00:00    1.845181
2013-01-01 00:00:54         NaN
2013-01-01 00:01:48         NaN
2013-01-01 00:02:42         NaN
2013-01-01 00:03:36         NaN
2013-01-01 00:04:30         NaN
2013-01-01 00:05:24         NaN
2013-01-01 00:06:18         NaN
2013-01-01 00:07:12         NaN
2013-01-01 00:08:06         NaN
2013-01-01 00:09:00   -0.997419
Freq: 54S, dtype: float64

Note: here is my versions:
pd.show_versions()

INSTALLED VERSIONS
------------------
commit: None
python: 3.4.2.final.0
python-bits: 64
OS: Darwin
OS-release: 14.1.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8

pandas: 0.15.2
nose: 1.3.4
Cython: 0.21
numpy: 1.9.1
scipy: 0.15.1
statsmodels: 0.5.0
IPython: 2.3.1
sphinx: 1.2.3
patsy: 0.3.0
dateutil: 2.1
pytz: 2014.9
bottleneck: None
tables: 3.1.1
numexpr: 2.3.1
matplotlib: 1.4.0
openpyxl: 1.8.5
xlrd: 0.9.3
xlwt: None
xlsxwriter: 0.5.7
lxml: 3.4.0
bs4: 4.3.2
html5lib: None
httplib2: None
apiclient: None
rpy2: None
sqlalchemy: 0.9.7
pymysql: 0.6.3.None
psycopg2: None

Thank you for your help

@jreback
Copy link
Contributor

jreback commented Feb 19, 2015

I don't think this is a bug per se, rather a convention / api issue.

IIRC (and i'll have to look further), it is actually reindexing here. (that's why the stamps that match with your original have values, but the others don't).

Doesn't seem very useful though.

In [1]: rng = pd.date_range('20130101',periods=10,freq='T')

In [2]: ts=pd.Series(np.arange(len(rng)), index=rng)

In [8]: ts.resample('54s',how='mean')
Out[8]: 
2013-01-01 00:00:00     0
2013-01-01 00:00:54     1
2013-01-01 00:01:48     2
2013-01-01 00:02:42     3
2013-01-01 00:03:36     4
2013-01-01 00:04:30     5
2013-01-01 00:05:24     6
2013-01-01 00:06:18     7
2013-01-01 00:07:12     8
2013-01-01 00:08:06   NaN
2013-01-01 00:09:00     9
Freq: 54S, dtype: float64

In [9]: ts.resample('54s')
Out[9]: 
2013-01-01 00:00:00     0
2013-01-01 00:00:54   NaN
2013-01-01 00:01:48   NaN
2013-01-01 00:02:42   NaN
2013-01-01 00:03:36   NaN
2013-01-01 00:04:30   NaN
2013-01-01 00:05:24   NaN
2013-01-01 00:06:18   NaN
2013-01-01 00:07:12   NaN
2013-01-01 00:08:06   NaN
2013-01-01 00:09:00     9
Freq: 54S, dtype: float64

@jreback
Copy link
Contributor

jreback commented Feb 19, 2015

what would your expectation be for the result using the input of np.arange(len(ts)) ?

@KevinLourd
Copy link
Author

KevinLourd commented Feb 19, 2015

I would expect the output[8] that you printed (thank you for the how="mean" tip).
However, that is not working, as explained below:

Taking for instance a smaller input set:

rng = pd.date_range('20130101',periods=3,freq='T')
ts=pd.Series(np.arange(len(rng)), index=rng)
print(ts)
2013-01-01 00:00:00    0
2013-01-01 00:01:00    1
2013-01-01 00:02:00    2
Freq: T, dtype: int64

When trying to divide in 5 parts, we have only 4... :

from datetime import timedelta
length = 5
timeSpan = (ts.index[-1]-ts.index[0]+timedelta(minutes=1))
rule = int(timeSpan.total_seconds()/length)
tsNew=ts.resample(str(rule)+"S").mean()
print(tsNew)
2013-01-01 00:00:00     0
2013-01-01 00:00:36     1
2013-01-01 00:01:12   NaN
2013-01-01 00:01:48     2
Freq: 36S, dtype: float64

I would expect an extra line with a 2 or a NaN like this:

2013-01-01 00:02:24     NaN

The example taken by jreback is a particular case, since it is rounded at 00:09:00 minutes, that is why there is the correct number of row that appears

@jreback
Copy link
Contributor

jreback commented Feb 20, 2015

So the fill_method argument applies to the filling for upsample (which is odd because its not consistent with other methods).

That said, there are a LOT of options for resample.

In [17]: ts.resample('36s',fill_method='pad',closed='right')
Out[17]: 
2013-01-01 00:00:00    0
2013-01-01 00:00:36    0
2013-01-01 00:01:12    1
2013-01-01 00:01:48    1
2013-01-01 00:02:24    2
Freq: 36S, dtype: int64

@jreback
Copy link
Contributor

jreback commented Feb 20, 2015

Just remembered for the first example, this requires upsampling so fill_method applies.

In [21]: ts.resample('54s',fill_method='pad')
Out[21]: 
2013-01-01 00:00:00    0
2013-01-01 00:00:54    0
2013-01-01 00:01:48    1
2013-01-01 00:02:42    2
2013-01-01 00:03:36    3
2013-01-01 00:04:30    4
2013-01-01 00:05:24    5
2013-01-01 00:06:18    6
2013-01-01 00:07:12    7
2013-01-01 00:08:06    8
2013-01-01 00:09:00    9
Freq: 54S, dtype: int64

@KevinLourd
Copy link
Author

ts.resample('36s',fill_method='pad',closed='right') works fine.
Although there is no rational reason to be obliged to put closed=right since what is expected here is a closed=left...

@mroeschke mroeschke added the Resample resample method label Nov 2, 2019
@mroeschke mroeschke added Needs Discussion Requires discussion from core team before further action and removed API Design labels Apr 12, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Needs Discussion Requires discussion from core team before further action Resample resample method
Projects
None yet
Development

No branches or pull requests

3 participants