Description
- I have checked that the issue still exists on the latest versions of the docs on
master
here
Location of the documentation
https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#offset-aliases
Documentation problem
The current documentation for the MS
offset alias reads as follows: "month start frequency", indicating that it is the complement of M
(which is "month end frequency").
However, for any date supplied to pd.date_range(start=[date], ... freq='MS')
other than the first of the month, this offset alias starts from the beginning of the next month. See this StackOverflow post (second answer) for a more detailed explanation.
However, in brief:
start = "2020-03-08"
end = "2021-03-08"
pd.date_range(start, end, freq='MS')
gives
DatetimeIndex(['2020-04-01', '2020-05-01', '2020-06-01', '2020-07-01',
'2020-08-01', '2020-09-01', '2020-10-01', '2020-11-01',
'2020-12-01', '2021-01-01', '2021-02-01', '2021-03-01'],
dtype='datetime64[ns]', freq='MS')
It should be noted that supplying the first day of the month as the start date keeps the expected behaviour (i.e. the start point of the range is the first of the month supplied).
Suggested fix for documentation
If the behaviour of the MS
offset alias is indeed correct (i.e. to start at the beginning of the next month), then this should be indicated in the documentation with a note.
For example:
"""
MS
= month start frequency. Note, for any day other than the first day of the month this will cause the offset to start at the beginning of the next month.
"""
The above explanation correctly encapsulates the current functionality of the MS
offset, especially within pd.date_range()
. The current documentation would incorrectly lead a person to believe that the MS
offset would cause the start date to be the start of the month of the date supplied.
If this behaviour is not correct, and MS
should be causing the start date to be the start of the month of the date supplied, then a subsequent bug would need to be raised.