Skip to content

Odd behavior with .resample('SM') #14888

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
gustavolinari opened this issue Dec 15, 2016 · 4 comments · Fixed by #17940
Closed

Odd behavior with .resample('SM') #14888

gustavolinari opened this issue Dec 15, 2016 · 4 comments · Fixed by #17940
Milestone

Comments

@gustavolinari
Copy link

Code Sample, a copy-pastable example if possible

import pandas as pd
from pandas import Timestamp

example = pd.DataFrame({'aux': {0: Timestamp('2000-01-03 00:00:00'),
  1: Timestamp('2000-01-04 00:00:00'),  2: Timestamp('2000-01-05 00:00:00'),
  3: Timestamp('2000-01-06 00:00:00'),  4: Timestamp('2000-01-07 00:00:00'),
  5: Timestamp('2000-01-10 00:00:00'),  6: Timestamp('2000-01-11 00:00:00'),
  7: Timestamp('2000-01-12 00:00:00'),  8: Timestamp('2000-01-13 00:00:00'),
  9: Timestamp('2000-01-14 00:00:00'),  10: Timestamp('2000-01-17 00:00:00'),
  11: Timestamp('2000-01-18 00:00:00'),  12: Timestamp('2000-01-19 00:00:00'),
  13: Timestamp('2000-01-20 00:00:00'),  14: Timestamp('2000-01-21 00:00:00'),
  15: Timestamp('2000-01-24 00:00:00'),  16: Timestamp('2000-01-26 00:00:00'),
  17: Timestamp('2000-01-27 00:00:00'),  18: Timestamp('2000-01-28 00:00:00'),
  19: Timestamp('2000-01-31 00:00:00'),  20: Timestamp('2000-02-01 00:00:00'),
  21: Timestamp('2000-02-02 00:00:00'),  22: Timestamp('2000-02-03 00:00:00'),
  23: Timestamp('2000-02-04 00:00:00'),  24: Timestamp('2000-02-07 00:00:00'),
  25: Timestamp('2000-02-08 00:00:00'),  26: Timestamp('2000-02-09 00:00:00'),
  27: Timestamp('2000-02-10 00:00:00'),  28: Timestamp('2000-02-11 00:00:00'),
  29: Timestamp('2000-02-14 00:00:00'),  30: Timestamp('2000-02-15 00:00:00'),
  31: Timestamp('2000-02-16 00:00:00'),  32: Timestamp('2000-02-17 00:00:00'),
  33: Timestamp('2000-02-18 00:00:00'),  34: Timestamp('2000-02-21 00:00:00'),
  35: Timestamp('2000-02-22 00:00:00'),  36: Timestamp('2000-02-23 00:00:00'),
  37: Timestamp('2000-02-24 00:00:00'),  38: Timestamp('2000-02-25 00:00:00'),
  39: Timestamp('2000-02-28 00:00:00'),  40: Timestamp('2000-02-29 00:00:00'),
  41: Timestamp('2000-03-01 00:00:00'),  42: Timestamp('2000-03-02 00:00:00'),
  43: Timestamp('2000-03-03 00:00:00'),  44: Timestamp('2000-03-08 00:00:00'),
  45: Timestamp('2000-03-09 00:00:00'),  46: Timestamp('2000-03-10 00:00:00'),
  47: Timestamp('2000-03-13 00:00:00'),  48: Timestamp('2000-03-14 00:00:00'),
  49: Timestamp('2000-03-15 00:00:00'),  50: Timestamp('2000-03-16 00:00:00'),
  51: Timestamp('2000-03-17 00:00:00'),  52: Timestamp('2000-03-20 00:00:00'),
  53: Timestamp('2000-03-21 00:00:00'),  54: Timestamp('2000-03-22 00:00:00'),
  55: Timestamp('2000-03-23 00:00:00'),  56: Timestamp('2000-03-24 00:00:00'),
  57: Timestamp('2000-03-27 00:00:00'),  58: Timestamp('2000-03-28 00:00:00'),
  59: Timestamp('2000-03-29 00:00:00'),  60: Timestamp('2000-03-30 00:00:00'),
  61: Timestamp('2000-03-31 00:00:00')}, u'date': {0: Timestamp('2000-01-03 00:00:00'),
  1: Timestamp('2000-01-04 00:00:00'),  2: Timestamp('2000-01-05 00:00:00'),
  3: Timestamp('2000-01-06 00:00:00'),  4: Timestamp('2000-01-07 00:00:00'),
  5: Timestamp('2000-01-10 00:00:00'),  6: Timestamp('2000-01-11 00:00:00'),
  7: Timestamp('2000-01-12 00:00:00'),  8: Timestamp('2000-01-13 00:00:00'),
  9: Timestamp('2000-01-14 00:00:00'),  10: Timestamp('2000-01-17 00:00:00'),
  11: Timestamp('2000-01-18 00:00:00'),  12: Timestamp('2000-01-19 00:00:00'),
  13: Timestamp('2000-01-20 00:00:00'),  14: Timestamp('2000-01-21 00:00:00'),
  15: Timestamp('2000-01-24 00:00:00'),  16: Timestamp('2000-01-26 00:00:00'),
  17: Timestamp('2000-01-27 00:00:00'),  18: Timestamp('2000-01-28 00:00:00'),
  19: Timestamp('2000-01-31 00:00:00'),  20: Timestamp('2000-02-01 00:00:00'),
  21: Timestamp('2000-02-02 00:00:00'),  22: Timestamp('2000-02-03 00:00:00'),
  23: Timestamp('2000-02-04 00:00:00'),  24: Timestamp('2000-02-07 00:00:00'),
  25: Timestamp('2000-02-08 00:00:00'),  26: Timestamp('2000-02-09 00:00:00'),
  27: Timestamp('2000-02-10 00:00:00'),  28: Timestamp('2000-02-11 00:00:00'),
  29: Timestamp('2000-02-14 00:00:00'),  30: Timestamp('2000-02-15 00:00:00'),
  31: Timestamp('2000-02-16 00:00:00'),  32: Timestamp('2000-02-17 00:00:00'),
  33: Timestamp('2000-02-18 00:00:00'),  34: Timestamp('2000-02-21 00:00:00'),
  35: Timestamp('2000-02-22 00:00:00'),  36: Timestamp('2000-02-23 00:00:00'),
  37: Timestamp('2000-02-24 00:00:00'),  38: Timestamp('2000-02-25 00:00:00'),
  39: Timestamp('2000-02-28 00:00:00'),  40: Timestamp('2000-02-29 00:00:00'),
  41: Timestamp('2000-03-01 00:00:00'),  42: Timestamp('2000-03-02 00:00:00'),
  43: Timestamp('2000-03-03 00:00:00'),  44: Timestamp('2000-03-08 00:00:00'),
  45: Timestamp('2000-03-09 00:00:00'),  46: Timestamp('2000-03-10 00:00:00'),
  47: Timestamp('2000-03-13 00:00:00'),  48: Timestamp('2000-03-14 00:00:00'),
  49: Timestamp('2000-03-15 00:00:00'),  50: Timestamp('2000-03-16 00:00:00'),
  51: Timestamp('2000-03-17 00:00:00'),  52: Timestamp('2000-03-20 00:00:00'),
  53: Timestamp('2000-03-21 00:00:00'),  54: Timestamp('2000-03-22 00:00:00'),
  55: Timestamp('2000-03-23 00:00:00'),  56: Timestamp('2000-03-24 00:00:00'),
  57: Timestamp('2000-03-27 00:00:00'),  58: Timestamp('2000-03-28 00:00:00'),
  59: Timestamp('2000-03-29 00:00:00'),  60: Timestamp('2000-03-30 00:00:00'),
  61: Timestamp('2000-03-31 00:00:00')}})

print example.set_index('date').resample('SM').last()

print example.set_index('date').resample('M').last()

print example.set_index('date').resample('W').last()

Problem description

When resampling with 'SM' the values in the 'aux' column have ocurred after or at the resampling date, whereas with 'M' or 'W' (or 'Q', or 'A'...) the values in the 'aux' columns have ocurred before or at the resampling date.

Expected Output

The expected output form the 'SM' resample would be:

                 aux
date                 
2000-01-15 2000-01-14
2000-01-31 2000-01-31
2000-02-15 2000-02-15
2000-02-29 2000-02-29
2000-03-15 2000-03-15
2000-03-31 2000-03-31

Output of pd.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 2.7.11.final.0 python-bits: 64 OS: Windows OS-release: 7 machine: AMD64 processor: Intel64 Family 6 Model 23 Stepping 10, GenuineIntel byteorder: little LC_ALL: None LANG: None LOCALE: None.None

pandas: 0.19.1
nose: 1.3.7
pip: 9.0.1
setuptools: 20.3
Cython: 0.22.1
numpy: 1.11.1
scipy: 0.18.0
statsmodels: 0.6.1
xarray: None
IPython: 4.0.0
sphinx: 1.3.1
patsy: 0.4.0
dateutil: 2.5.2
pytz: 2016.3
blosc: None
bottleneck: 1.0.0
tables: 3.2.0
numexpr: 2.5.2
matplotlib: 1.4.3
openpyxl: 1.8.5
xlrd: 0.9.3
xlwt: 1.0.0
xlsxwriter: 0.7.3
lxml: 3.4.4
bs4: 4.3.2
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: 1.0.8
pymysql: None
psycopg2: None
jinja2: 2.8
boto: 2.38.0
pandas_datareader: None

@chris-b1
Copy link
Contributor

@adrienemery - thoughts?

@adrienemery
Copy link
Contributor

This has to do with the default behaviour of resample() and where bin labels go. There are certain frequency offsets that default to label='right' - specifically 'M', 'A', 'Q', 'BM', 'BA', 'BQ', 'W' - but all other offsets default to label='left'.

Source: https://github.com/pandas-dev/pandas/blob/v0.19.1/pandas/tseries/resample.py#L985-L997

Notice if you use the MS offset with your example you get the same behaviour as using SM since its not in the list of offsets mentioned above.

print(example.set_index('date').resample('MS').last())

# Output
#
#                   aux
# date                 
# 2000-01-01 2000-01-31
# 2000-02-01 2000-02-29
# 2000-03-01 2000-03-31

If you want the behaviour you desire you need to pass in label='right'.

print(example.set_index('date').resample('SM', label='right').last())

# Output
#
#                   aux
# date                 
# 2000-01-15 2000-01-14
# 2000-01-31 2000-01-28
# 2000-02-15 2000-02-14
# 2000-02-29 2000-02-28
# 2000-03-15 2000-03-14
# 2000-03-31 2000-03-30
# 2000-04-15 2000-03-31

For more on bin labels and what side to close bins look at the examples in the resample documentation here:
http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.resample.html

Hopefully this clears it up.

@jreback
Copy link
Contributor

jreback commented Dec 15, 2016

@adrienemery

good explanation. can you add (a shorter) example like this to the docs (and to the doc-string if it will fit, though maybe a link to the doc section is better).

@jreback jreback added this to the Next Major Release milestone Dec 15, 2016
@gustavolinari
Copy link
Author

@adrienemery thank you, the explanation was very clear.
With this I got exactly the behavior I was expecting (it was necessary to use closed='right' as well)

print example.set_index('date').resample('SM', label='right', closed='right').last()

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants