-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
ENH: Add SemiMonthEnd and SemiMonthBegin offsets #1543 #13315
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Current coverage is 84.28%@@ master #13315 diff @@
==========================================
Files 138 138
Lines 50810 50934 +124
Methods 0 0
Messages 0 0
Branches 0 0
==========================================
+ Hits 42809 42931 +122
- Misses 8001 8003 +2
Partials 0 0
|
@@ -1160,6 +1161,92 @@ def onOffset(self, dt): | |||
_prefix = 'MS' | |||
|
|||
|
|||
class SemiMonthEnd(SingleConstructorOffset): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make a common class which this inherits to put common routines (SemiMonthOffset)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can pull out a lot of the .apply
method and put it there
cc @sinhrks |
@jreback I have made the requested changes - I'll squash the new commit(s) once it's approved. |
|
||
def _rollforward_from_15(self, other): | ||
"""Return the next offset if other is on the 15th of the month""" | ||
raise NotImplementedError |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
raise AbstractMethodError instead here (same idea but more descriptive)
@jreback Any additional changes you see? Thanks. |
if n >= 0: | ||
if n == 0 and not self.onOffset(other): | ||
n = 1 # rollforward | ||
for _ in range(n): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks inefficient. Can't you add n // 2
months and n % 2
15 days?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call - that should also simplify things.
there are some benchmarks for |
Will do. |
00e603e
to
e7d8257
Compare
Added benchmarks and optimized benchmark results: [ 0.00%] · For pandas commit hash e7d8257c:
[ 0.00%] ·· Building for conda-py2.7-Cython-matplotlib-numexpr-numpy-openpyxl-pytables-scipy-sqlalchemy-xlrd-xlsxwriter-xlwt.......................................
[ 0.00%] ·· Benchmarking conda-py2.7-Cython-matplotlib-numexpr-numpy-openpyxl-pytables-scipy-sqlalchemy-xlrd-xlsxwriter-xlwt
...
[ 0.01%] ··· Running timeseries.timeseries_semi_month_offset.time_semi_month_begin_apply 69.11μs
[ 0.01%] ··· Running timeseries.timeseries_semi_month_offset.time_semi_month_begin_decr 77.51μs
[ 0.01%] ··· Running timeseries.timeseries_semi_month_offset.time_semi_month_begin_decr_n 90.41μs
[ 0.01%] ··· Running timeseries.timeseries_semi_month_offset.time_semi_month_begin_incr 79.20μs
[ 0.01%] ··· Running timeseries.timeseries_semi_month_offset.time_semi_month_begin_incr_n 99.60μs
[ 0.01%] ··· Running timeseries.timeseries_semi_month_offset.time_semi_month_end_apply 76.50μs
[ 0.01%] ··· Running timeseries.timeseries_semi_month_offset.time_semi_month_end_decr 125.04μs
[ 0.01%] ··· Running timeseries.timeseries_semi_month_offset.time_semi_month_end_decr_n 113.05μs
[ 0.01%] ··· Running timeseries.timeseries_semi_month_offset.time_semi_month_end_incr 81.48μs
[ 0.01%] ··· Running timeseries.timeseries_semi_month_offset.time_semi_month_end_incr_n 95.23μs |
this looks good. can you see if you can add an |
Semi-Month Offsets | ||
^^^^^^^^^^^^^^^^^^ | ||
|
||
The ``SemiMonthEnd`` ('SM') ``SemiMonthBegin`` ('SMS') offsets provide date offsets |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pandas has gained new frequency offsets, SemiMonthEnd
('SM') and SemiMonthBegin
('SMS'). These provide date offsets anchored ....
I am probably missing something obvious, but should (I'm ignoring February - but maybe that should be a day earlier...) |
I think this is pretty standard to do semi-montly on 1/15th. @adrienemery thoughts? |
@MaximilianR @jreback I've seen both personally but how about we make it flexible such that the 15th becomes a parameter. That way people can use it for their application as needed. I like keeping 15th as the default since it provides a nice default symmetry. See latest commit for proposed change - I haven't added tests yet, but if we agree to go this route I will, otherwise will rollback that change. I did test it in a notebook and it works nicely: |
yeah that's prob reasonable though need to add routines to parse SMS-16 then look at the last week freq to see what the parameters are named (eg not pivot) |
I don't understand what you mean by this. |
you are thinking of allowing
|
ff0a9c6
to
0f42c73
Compare
I'd like to add a few more tests for Updated asv benchmark output with [ 0.01%] ··· Running timeseries.timeseries_semi_month_offset.time_semi_month_begin_apply 58.91μs
[ 0.01%] ··· Running timeseries.timeseries_semi_month_offset.time_semi_month_begin_apply_index 49.61ms
[ 0.01%] ··· Running timeseries.timeseries_semi_month_offset.time_semi_month_begin_decr 68.74μs
[ 0.01%] ··· Running timeseries.timeseries_semi_month_offset.time_semi_month_begin_decr_n 80.07μs
[ 0.01%] ··· Running timeseries.timeseries_semi_month_offset.time_semi_month_begin_decr_rng 70.12ms
[ 0.01%] ··· Running timeseries.timeseries_semi_month_offset.time_semi_month_begin_incr 75.89μs
[ 0.01%] ··· Running timeseries.timeseries_semi_month_offset.time_semi_month_begin_incr_n 73.19μs
[ 0.01%] ··· Running timeseries.timeseries_semi_month_offset.time_semi_month_begin_incr_rng 56.28ms
[ 0.01%] ··· Running timeseries.timeseries_semi_month_offset.time_semi_month_end_apply 73.79μs
[ 0.01%] ··· Running timeseries.timeseries_semi_month_offset.time_semi_month_end_apply_index 57.93ms
[ 0.01%] ··· Running timeseries.timeseries_semi_month_offset.time_semi_month_end_decr 77.41μs
[ 0.01%] ··· Running timeseries.timeseries_semi_month_offset.time_semi_month_end_decr_n 83.09μs
[ 0.01%] ··· Running timeseries.timeseries_semi_month_offset.time_semi_month_end_decr_rng 56.51ms
[ 0.01%] ··· Running timeseries.timeseries_semi_month_offset.time_semi_month_end_incr 66.83μs
[ 0.01%] ··· Running timeseries.timeseries_semi_month_offset.time_semi_month_end_incr_n 70.97μs
[ 0.01%] ··· Running timeseries.timeseries_semi_month_offset.time_semi_month_end_incr_rng 55.86ms |
0f42c73
to
b453149
Compare
Pandas has gained new frequency offsets, ``SemiMonthEnd`` ('SM') and ``SemiMonthBegin`` ('SMS'). | ||
These provide date offsets anchored (by default) to the 15th and end of month, and 15th and 1st of month respectively. | ||
Optionally you can specify the ``day_of_month`` which will be used instead of the 15th each month. Anchoring suffixes | ||
are also supported for setting the `day_of_month``. Note that ``day_of_month`` must be 1<=day_of_month<=27 for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you don't need this last sentence (the Note). That can be part of the doc-string.
def _apply_index(self, i, before_day_of_month, after_day_of_month): | ||
n = self.n | ||
time = i.to_perioddelta('D') | ||
is_month_end = tslib.get_start_end_field(i.asi8, 'is_month_end') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not a big deal, but why not just i.is_month_end
few comments, looks good overall |
bec3e2a
to
cc6c61a
Compare
Looks good to me. If you ever hit the wall on vectorized performance, it probably wouldn't be hard to modify or clone this cython helper to handle your case, but definitely not necessary, what you're doing is pretty quick already. |
assert (result == expected) | ||
|
||
freqstr = '2SM-16' | ||
result = frequencies.to_offset(freqstr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add some tests where this fails (e.g. it should fail)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add SM-15
as well (which should work)
cc6c61a
to
a4bbec2
Compare
|
|
||
pd.date_range('2015-01-01', freq='SMS', periods=4) | ||
|
||
Using the anchoring suffix: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using the anchoring suffix, you can also specify the day of month for the semi portion (maybe not right word)
@adrienemery looks real good. just some minor comments. ping when pushed and green. |
03eb8db
to
91573dc
Compare
91573dc
to
fe221b2
Compare
@jreback should be good to go - I have addressed comments and build is green |
@adrienemery thanks! nice PR. these have a lot of pieces for a full patch. Nice job! give a check on the built docs (prob take a few hours), and if needs corrections, pls issue a followup PR. |
@jreback thanks for your help on this PR - especially as this is my first time contributing to pandas. Also thanks for all your work on pandas 👍 |
git diff upstream/master | flake8 --diff