ENH: Add SemiMonthEnd and SemiMonthBegin offsets #1543 #13315

adrienemery · 2016-05-29T06:00:25Z

closes Support for bimonthly/weekly timerules #1543
tests added / passed
passes git diff upstream/master | flake8 --diff
whatsnew entry

codecov-io · 2016-05-29T07:00:08Z

Current coverage is 84.28%

Merging #13315 into master will increase coverage by 0.03%

@@             master     #13315   diff @@
==========================================
  Files           138        138          
  Lines         50810      50934   +124   
  Methods           0          0          
  Messages          0          0          
  Branches          0          0          
==========================================
+ Hits          42809      42931   +122   
- Misses         8001       8003     +2   
  Partials          0          0

Powered by Codecov. Last updated by 07761c5...fe221b2

jreback · 2016-05-29T15:02:48Z

pandas/tseries/offsets.py

@@ -1160,6 +1161,92 @@ def onOffset(self, dt):
    _prefix = 'MS'


+class SemiMonthEnd(SingleConstructorOffset):


make a common class which this inherits to put common routines (SemiMonthOffset)

I think you can pull out a lot of the .apply method and put it there

jreback · 2016-05-29T15:05:41Z

cc @sinhrks

adrienemery · 2016-05-30T05:46:13Z

@jreback I have made the requested changes - I'll squash the new commit(s) once it's approved.

jreback · 2016-05-30T13:22:33Z

pandas/tseries/offsets.py

+
+    def _rollforward_from_15(self, other):
+        """Return the next offset if other is on the 15th of the month"""
+        raise NotImplementedError


raise AbstractMethodError instead here (same idea but more descriptive)

adrienemery · 2016-05-30T21:57:10Z

@jreback Any additional changes you see? Thanks.

sinhrks · 2016-05-30T23:56:45Z

pandas/tseries/offsets.py

+        if n >= 0:
+            if n == 0 and not self.onOffset(other):
+                n = 1  # rollforward
+            for _ in range(n):


This looks inefficient. Can't you add n // 2 months and n % 2 15 days?

Good call - that should also simplify things.

jreback · 2016-05-31T11:58:25Z

there are some benchmarks for .apply in asv_bench/benchmarks/timeseries, pls add this new freq there as well.

adrienemery · 2016-05-31T16:31:21Z

Will do.

adrienemery · 2016-06-01T07:34:42Z

Added benchmarks and optimized .apply to not use inefficient looping.

benchmark results:

[  0.00%] · For pandas commit hash e7d8257c:
[  0.00%] ·· Building for conda-py2.7-Cython-matplotlib-numexpr-numpy-openpyxl-pytables-scipy-sqlalchemy-xlrd-xlsxwriter-xlwt.......................................
[  0.00%] ·· Benchmarking conda-py2.7-Cython-matplotlib-numexpr-numpy-openpyxl-pytables-scipy-sqlalchemy-xlrd-xlsxwriter-xlwt
...
[  0.01%] ··· Running timeseries.timeseries_semi_month_offset.time_semi_month_begin_apply   69.11μs
[  0.01%] ··· Running timeseries.timeseries_semi_month_offset.time_semi_month_begin_decr    77.51μs
[  0.01%] ··· Running timeseries.timeseries_semi_month_offset.time_semi_month_begin_decr_n  90.41μs
[  0.01%] ··· Running timeseries.timeseries_semi_month_offset.time_semi_month_begin_incr    79.20μs
[  0.01%] ··· Running timeseries.timeseries_semi_month_offset.time_semi_month_begin_incr_n  99.60μs
[  0.01%] ··· Running timeseries.timeseries_semi_month_offset.time_semi_month_end_apply     76.50μs
[  0.01%] ··· Running timeseries.timeseries_semi_month_offset.time_semi_month_end_decr      125.04μs
[  0.01%] ··· Running timeseries.timeseries_semi_month_offset.time_semi_month_end_decr_n    113.05μs
[  0.01%] ··· Running timeseries.timeseries_semi_month_offset.time_semi_month_end_incr      81.48μs
[  0.01%] ··· Running timeseries.timeseries_semi_month_offset.time_semi_month_end_incr_n    95.23μs

jreback · 2016-06-01T11:20:43Z

this looks good. can you see if you can add an apply_index (high-perf apply).

jreback · 2016-06-01T11:21:14Z

doc/source/whatsnew/v0.18.2.txt

+Semi-Month Offsets
+^^^^^^^^^^^^^^^^^^
+
+The ``SemiMonthEnd`` ('SM') ``SemiMonthBegin`` ('SMS') offsets provide date offsets


Pandas has gained new frequency offsets, SemiMonthEnd ('SM') and SemiMonthBegin ('SMS'). These provide date offsets anchored ....

max-sixty · 2016-06-01T14:17:17Z

I am probably missing something obvious, but should SemiMonthBegin be the 1st & 16th of every month? That way, a 30 day month has an equal number of days in each semi-month?

(I'm ignoring February - but maybe that should be a day earlier...)

jreback · 2016-06-01T18:04:34Z

I think this is pretty standard to do semi-montly on 1/15th. @adrienemery thoughts?

adrienemery · 2016-06-01T20:46:55Z

@MaximilianR @jreback I've seen both personally but how about we make it flexible such that the 15th becomes a parameter. That way people can use it for their application as needed. I like keeping 15th as the default since it provides a nice default symmetry.

See latest commit for proposed change - I haven't added tests yet, but if we agree to go this route I will, otherwise will rollback that change.

I did test it in a notebook and it works nicely:

adrienemery · 2016-06-01T20:55:27Z

I could also update _from_name such that it allows setting pivot day such as SMS-16

    @classmethod
    def _from_name(cls, suffix=None):
        if not suffix:
            pivot = 15
        else:
            pivot = int(suffix)
        return cls(pivot=pivot)

jreback · 2016-06-01T21:53:01Z

yeah that's prob reasonable though need to add routines to parse SMS-16 then

look at the last week freq to see what the parameters are named (eg not pivot)

adrienemery · 2016-06-01T22:04:42Z

look at the last week freq to see what the parameters are named (eg not pivot)

I don't understand what you mean by this.

jreback · 2016-06-02T00:50:05Z

you are thinking of allowing SMS-16, so needs to parse this

In [2]: pd.tseries.frequencies.to_offset('1MS')
Out[2]: <MonthBegin>

In [3]: pd.tseries.frequencies.to_offset('2M')
Out[3]: <2 * MonthEnds>

adrienemery · 2016-06-08T06:07:33Z

added .apply_index methods
added flexible day_of_month parameter that defaults to the 15th
added support to parse day_of_month from frequency string

I'd like to add a few more tests for .apply_index but wanted to get feedback on current change in case there are any issues/changes to be made.

Updated asv benchmark output with .apply_index included:

[  0.01%] ··· Running timeseries.timeseries_semi_month_offset.time_semi_month_begin_apply           58.91μs
[  0.01%] ··· Running timeseries.timeseries_semi_month_offset.time_semi_month_begin_apply_index     49.61ms
[  0.01%] ··· Running timeseries.timeseries_semi_month_offset.time_semi_month_begin_decr            68.74μs
[  0.01%] ··· Running timeseries.timeseries_semi_month_offset.time_semi_month_begin_decr_n          80.07μs
[  0.01%] ··· Running timeseries.timeseries_semi_month_offset.time_semi_month_begin_decr_rng        70.12ms
[  0.01%] ··· Running timeseries.timeseries_semi_month_offset.time_semi_month_begin_incr            75.89μs
[  0.01%] ··· Running timeseries.timeseries_semi_month_offset.time_semi_month_begin_incr_n          73.19μs
[  0.01%] ··· Running timeseries.timeseries_semi_month_offset.time_semi_month_begin_incr_rng        56.28ms
[  0.01%] ··· Running timeseries.timeseries_semi_month_offset.time_semi_month_end_apply             73.79μs
[  0.01%] ··· Running timeseries.timeseries_semi_month_offset.time_semi_month_end_apply_index       57.93ms
[  0.01%] ··· Running timeseries.timeseries_semi_month_offset.time_semi_month_end_decr              77.41μs
[  0.01%] ··· Running timeseries.timeseries_semi_month_offset.time_semi_month_end_decr_n            83.09μs
[  0.01%] ··· Running timeseries.timeseries_semi_month_offset.time_semi_month_end_decr_rng          56.51ms
[  0.01%] ··· Running timeseries.timeseries_semi_month_offset.time_semi_month_end_incr              66.83μs
[  0.01%] ··· Running timeseries.timeseries_semi_month_offset.time_semi_month_end_incr_n            70.97μs
[  0.01%] ··· Running timeseries.timeseries_semi_month_offset.time_semi_month_end_incr_rng          55.86ms

jreback · 2016-06-08T11:14:50Z

doc/source/whatsnew/v0.18.2.txt

+Pandas has gained new frequency offsets, ``SemiMonthEnd`` ('SM') and ``SemiMonthBegin`` ('SMS').
+These provide date offsets anchored (by default) to the 15th and end of month, and 15th and 1st of month respectively.
+Optionally you can specify the ``day_of_month`` which will be used instead of the 15th each month. Anchoring suffixes
+are also supported for setting the `day_of_month``. Note that ``day_of_month`` must be 1<=day_of_month<=27 for


you don't need this last sentence (the Note). That can be part of the doc-string.

chris-b1 · 2016-06-08T11:58:39Z

pandas/tseries/offsets.py

+    def _apply_index(self, i, before_day_of_month, after_day_of_month):
+        n = self.n
+        time = i.to_perioddelta('D')
+        is_month_end = tslib.get_start_end_field(i.asi8, 'is_month_end')


not a big deal, but why not just i.is_month_end

chris-b1 · 2016-06-08T12:14:02Z

few comments, looks good overall

adrienemery · 2016-06-11T05:43:19Z

I have addressed comments cc/ @jreback @chris-b1

chris-b1 · 2016-06-11T12:02:58Z

Looks good to me. If you ever hit the wall on vectorized performance, it probably wouldn't be hard to modify or clone this cython helper to handle your case, but definitely not necessary, what you're doing is pretty quick already.

jreback · 2016-06-11T12:15:28Z

pandas/tseries/tests/test_frequencies.py

+    assert (result == expected)
+
+    freqstr = '2SM-16'
+    result = frequencies.to_offset(freqstr)


add some tests where this fails (e.g. it should fail)

add SM-15 as well (which should work)

adrienemery · 2016-06-12T00:41:01Z

updated whatsnew
added more tests (including failing cases)

jreback · 2016-06-13T11:43:03Z

doc/source/whatsnew/v0.18.2.txt

+
+    pd.date_range('2015-01-01', freq='SMS', periods=4)
+
+Using the anchoring suffix:


Using the anchoring suffix, you can also specify the day of month for the semi portion (maybe not right word)

jreback · 2016-06-13T11:50:44Z

@adrienemery looks real good. just some minor comments. ping when pushed and green.

Closes pandas-dev#1543

adrienemery · 2016-06-14T17:33:44Z

@jreback should be good to go - I have addressed comments and build is green

jreback · 2016-06-14T21:23:09Z

@adrienemery thanks! nice PR. these have a lot of pieces for a full patch. Nice job!

give a check on the built docs (prob take a few hours), and if needs corrections, pls issue a followup PR.

adrienemery · 2016-06-14T22:18:03Z

@jreback thanks for your help on this PR - especially as this is my first time contributing to pandas. Also thanks for all your work on pandas 👍

jreback reviewed May 29, 2016
View reviewed changes

jreback added Frequency DateOffsets Enhancement labels May 29, 2016

jreback reviewed May 30, 2016
View reviewed changes

sinhrks reviewed May 30, 2016
View reviewed changes

adrienemery force-pushed the semi-monthly-offset branch from 00e603e to e7d8257 Compare June 1, 2016 07:14

jreback added this to the 0.18.2 milestone Jun 1, 2016

jreback reviewed Jun 1, 2016
View reviewed changes

adrienemery force-pushed the semi-monthly-offset branch 2 times, most recently from ff0a9c6 to 0f42c73 Compare June 7, 2016 06:17

adrienemery force-pushed the semi-monthly-offset branch from 0f42c73 to b453149 Compare June 8, 2016 06:10

jreback reviewed Jun 8, 2016
View reviewed changes

chris-b1 reviewed Jun 8, 2016
View reviewed changes

adrienemery force-pushed the semi-monthly-offset branch 2 times, most recently from bec3e2a to cc6c61a Compare June 11, 2016 05:13

jreback reviewed Jun 11, 2016
View reviewed changes

adrienemery force-pushed the semi-monthly-offset branch from cc6c61a to a4bbec2 Compare June 12, 2016 00:12

jreback reviewed Jun 13, 2016
View reviewed changes

adrienemery force-pushed the semi-monthly-offset branch 3 times, most recently from 03eb8db to 91573dc Compare June 14, 2016 07:08

ENH: Add SemiMonthEnd and SemiMonthBegin offsets pandas-dev#1543

fe221b2

Closes pandas-dev#1543

adrienemery force-pushed the semi-monthly-offset branch from 91573dc to fe221b2 Compare June 14, 2016 15:48

jreback closed this in bd66592 Jun 14, 2016

		@@ -1160,6 +1161,92 @@ def onOffset(self, dt):
		_prefix = 'MS'


		class SemiMonthEnd(SingleConstructorOffset):


		pd.date_range('2015-01-01', freq='SMS', periods=4)

		Using the anchoring suffix:

ENH: Add SemiMonthEnd and SemiMonthBegin offsets #1543 #13315

ENH: Add SemiMonthEnd and SemiMonthBegin offsets #1543 #13315

Conversation

adrienemery commented May 29, 2016 • edited Loading

codecov-io commented May 29, 2016 • edited Loading

Current coverage is 84.28%

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback commented May 29, 2016

adrienemery commented May 30, 2016 • edited Loading

Choose a reason for hiding this comment

adrienemery commented May 30, 2016 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback commented May 31, 2016

adrienemery commented May 31, 2016

adrienemery commented Jun 1, 2016

jreback commented Jun 1, 2016

jreback Jun 1, 2016 • edited Loading

Choose a reason for hiding this comment

max-sixty commented Jun 1, 2016 • edited Loading

jreback commented Jun 1, 2016

adrienemery commented Jun 1, 2016 • edited Loading

adrienemery commented Jun 1, 2016 • edited Loading

jreback commented Jun 1, 2016

adrienemery commented Jun 1, 2016

jreback commented Jun 2, 2016

adrienemery commented Jun 8, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

chris-b1 commented Jun 8, 2016

adrienemery commented Jun 11, 2016

chris-b1 commented Jun 11, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

adrienemery commented Jun 12, 2016

jreback Jun 13, 2016 • edited Loading

Choose a reason for hiding this comment

jreback commented Jun 13, 2016

adrienemery commented Jun 14, 2016

jreback commented Jun 14, 2016

adrienemery commented Jun 14, 2016 • edited Loading

adrienemery commented May 29, 2016 •

edited

Loading

codecov-io commented May 29, 2016 •

edited

Loading

adrienemery commented May 30, 2016 •

edited

Loading

adrienemery commented May 30, 2016 •

edited

Loading

jreback Jun 1, 2016 •

edited

Loading

max-sixty commented Jun 1, 2016 •

edited

Loading

adrienemery commented Jun 1, 2016 •

edited

Loading

adrienemery commented Jun 1, 2016 •

edited

Loading

jreback Jun 13, 2016 •

edited

Loading

adrienemery commented Jun 14, 2016 •

edited

Loading