Skip to content

resampling from Day to BusinessDay pulls weekend data back to friday. #15837

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
erbian opened this issue Mar 29, 2017 · 6 comments
Closed

resampling from Day to BusinessDay pulls weekend data back to friday. #15837

erbian opened this issue Mar 29, 2017 · 6 comments
Labels
Duplicate Report Duplicate issue or pull request Resample resample method

Comments

@erbian
Copy link
Contributor

erbian commented Mar 29, 2017

Code Sample

result = pd.Series(1., pd.date_range('20170101','20181231',freq='D')).resample('B').last().to_period()

print result.index[0]
# Period('2016-12-30', 'B')

pd.Period('20170101', freq='B')
# Period('2017-01-02', 'B')

Problem description

when i have daily data that spans weekends (e.g., 1/1/2017 which is a Sunday) and try to get the last available value for a BusinessDay, the data falls back to the Friday period. This is inconsistent with the timestamp of Sunday belonging to the Monday BusinessDay period (e.g., pd.Period('20170101', 'B') goes to 1/2/2017).

Expected Output

I would expect that the result.index[0] above would return Period('2017-01-02', 'B')

Output of pd.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 2.7.12.final.0 python-bits: 64 OS: Linux OS-release: 4.4.0-36-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: None.None

pandas: 0.19.2
nose: 1.3.7
pip: 9.0.1
setuptools: 34.3.0
Cython: 0.24
numpy: 1.12.0
scipy: 0.18.1
statsmodels: 0.6.1
xarray: 0.9.1
IPython: 4.1.1
sphinx: None
patsy: 0.4.1
dateutil: 2.6.0
pytz: 2016.10
blosc: None
bottleneck: 1.2.0
tables: None
numexpr: 2.5
matplotlib: 1.5.1
openpyxl: 2.3.3
xlrd: 0.9.4
xlwt: 1.0.0
xlsxwriter: None
lxml: 3.5.0
bs4: 4.5.1
html5lib: None
httplib2: 0.9.2
apiclient: None
sqlalchemy: 1.0.12
pymysql: None
psycopg2: None
jinja2: 2.8
boto: 2.45.0
pandas_datareader: None

@erbian
Copy link
Contributor Author

erbian commented Mar 29, 2017

possibly related to #10575

@jreback jreback added the Resample resample method label Mar 29, 2017
@jreback
Copy link
Contributor

jreback commented Mar 29, 2017

duplicate of this issue: #11123

@jreback jreback closed this as completed Mar 29, 2017
@jreback jreback added the Duplicate Report Duplicate issue or pull request label Mar 29, 2017
@jreback jreback added this to the No action milestone Mar 29, 2017
@jreback
Copy link
Contributor

jreback commented Mar 29, 2017

this is basically a convention. Though I guess it could be regarded as a bug as well. See the discussion on #11123. (and the xref issue you pointed). If you have some thoughts, pls share.

@erbian
Copy link
Contributor Author

erbian commented Mar 29, 2017

@jreback I understand that the choice of including Fri-Sat-Sun in Friday period is convention (though as others have pointed out, probably not ideal for time-series analysis given look-forward issues). I think it is then inconsistent that creating a period from a timestamp returns a period for which the timestamp is not included. for example:

 pd.Period(pd.datetime(2017,1,1), 'B').start_time
#  Timestamp('2017-01-02 00:00:00')

does the issue i am raising make sense?

@jreback
Copy link
Contributor

jreback commented Mar 29, 2017

In [3]: pd.datetime(2017,1,1) + pd.offsets.BusinessDay()
Out[3]: Timestamp('2017-01-02 00:00:00')

this is just convention (and documented).

In the other issue I suggested a new frequency as B is essentially look-ahead, maybe a look-behind one.

this looks reasonable actually. I think using .to_period() itself is the issue.

In [27]: r = pd.Series(1., pd.date_range('20170101','20170131',freq='D')).resample('B').asfreq()

In [28]: pd.concat([r, r.index.to_series().dt.weekday_name], axis=1)
Out[28]: 
              0          1
2016-12-30  NaN     Friday
2017-01-02  1.0     Monday
2017-01-03  1.0    Tuesday
2017-01-04  1.0  Wednesday
2017-01-05  1.0   Thursday
2017-01-06  1.0     Friday
2017-01-09  1.0     Monday
2017-01-10  1.0    Tuesday
2017-01-11  1.0  Wednesday
2017-01-12  1.0   Thursday
2017-01-13  1.0     Friday
2017-01-16  1.0     Monday
2017-01-17  1.0    Tuesday
2017-01-18  1.0  Wednesday
2017-01-19  1.0   Thursday
2017-01-20  1.0     Friday
2017-01-23  1.0     Monday
2017-01-24  1.0    Tuesday
2017-01-25  1.0  Wednesday
2017-01-26  1.0   Thursday
2017-01-27  1.0     Friday
2017-01-30  1.0     Monday
2017-01-31  1.0    Tuesday

@jreback
Copy link
Contributor

jreback commented Mar 29, 2017

cc @chris-b1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Duplicate Report Duplicate issue or pull request Resample resample method
Projects
None yet
Development

No branches or pull requests

2 participants