Skip to content

ENH: Add SemiMonthEnd and SemiMonthBegin offsets #1543 #13315

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

adrienemery
Copy link
Contributor

@adrienemery adrienemery commented May 29, 2016

@codecov-io
Copy link

codecov-io commented May 29, 2016

Current coverage is 84.28%

Merging #13315 into master will increase coverage by 0.03%

@@             master     #13315   diff @@
==========================================
  Files           138        138          
  Lines         50810      50934   +124   
  Methods           0          0          
  Messages          0          0          
  Branches          0          0          
==========================================
+ Hits          42809      42931   +122   
- Misses         8001       8003     +2   
  Partials          0          0          

Powered by Codecov. Last updated by 07761c5...fe221b2

@@ -1160,6 +1161,92 @@ def onOffset(self, dt):
_prefix = 'MS'


class SemiMonthEnd(SingleConstructorOffset):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make a common class which this inherits to put common routines (SemiMonthOffset)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can pull out a lot of the .apply method and put it there

@jreback
Copy link
Contributor

jreback commented May 29, 2016

cc @sinhrks

@adrienemery
Copy link
Contributor Author

adrienemery commented May 30, 2016

@jreback I have made the requested changes - I'll squash the new commit(s) once it's approved.


def _rollforward_from_15(self, other):
"""Return the next offset if other is on the 15th of the month"""
raise NotImplementedError
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

raise AbstractMethodError instead here (same idea but more descriptive)

@adrienemery
Copy link
Contributor Author

adrienemery commented May 30, 2016

@jreback Any additional changes you see? Thanks.

if n >= 0:
if n == 0 and not self.onOffset(other):
n = 1 # rollforward
for _ in range(n):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks inefficient. Can't you add n // 2 months and n % 2 15 days?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call - that should also simplify things.

@jreback
Copy link
Contributor

jreback commented May 31, 2016

there are some benchmarks for .apply in asv_bench/benchmarks/timeseries, pls add this new freq there as well.

@adrienemery
Copy link
Contributor Author

Will do.

@adrienemery adrienemery force-pushed the semi-monthly-offset branch from 00e603e to e7d8257 Compare June 1, 2016 07:14
@adrienemery
Copy link
Contributor Author

Added benchmarks and optimized .apply to not use inefficient looping.

benchmark results:

[  0.00%] · For pandas commit hash e7d8257c:
[  0.00%] ·· Building for conda-py2.7-Cython-matplotlib-numexpr-numpy-openpyxl-pytables-scipy-sqlalchemy-xlrd-xlsxwriter-xlwt.......................................
[  0.00%] ·· Benchmarking conda-py2.7-Cython-matplotlib-numexpr-numpy-openpyxl-pytables-scipy-sqlalchemy-xlrd-xlsxwriter-xlwt
...
[  0.01%] ··· Running timeseries.timeseries_semi_month_offset.time_semi_month_begin_apply   69.11μs
[  0.01%] ··· Running timeseries.timeseries_semi_month_offset.time_semi_month_begin_decr    77.51μs
[  0.01%] ··· Running timeseries.timeseries_semi_month_offset.time_semi_month_begin_decr_n  90.41μs
[  0.01%] ··· Running timeseries.timeseries_semi_month_offset.time_semi_month_begin_incr    79.20μs
[  0.01%] ··· Running timeseries.timeseries_semi_month_offset.time_semi_month_begin_incr_n  99.60μs
[  0.01%] ··· Running timeseries.timeseries_semi_month_offset.time_semi_month_end_apply     76.50μs
[  0.01%] ··· Running timeseries.timeseries_semi_month_offset.time_semi_month_end_decr      125.04μs
[  0.01%] ··· Running timeseries.timeseries_semi_month_offset.time_semi_month_end_decr_n    113.05μs
[  0.01%] ··· Running timeseries.timeseries_semi_month_offset.time_semi_month_end_incr      81.48μs
[  0.01%] ··· Running timeseries.timeseries_semi_month_offset.time_semi_month_end_incr_n    95.23μs

@jreback jreback added this to the 0.18.2 milestone Jun 1, 2016
@jreback
Copy link
Contributor

jreback commented Jun 1, 2016

this looks good. can you see if you can add an apply_index (high-perf apply).

Semi-Month Offsets
^^^^^^^^^^^^^^^^^^

The ``SemiMonthEnd`` ('SM') ``SemiMonthBegin`` ('SMS') offsets provide date offsets
Copy link
Contributor

@jreback jreback Jun 1, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pandas has gained new frequency offsets, SemiMonthEnd ('SM') and SemiMonthBegin ('SMS'). These provide date offsets anchored ....

@max-sixty
Copy link
Contributor

max-sixty commented Jun 1, 2016

I am probably missing something obvious, but should SemiMonthBegin be the 1st & 16th of every month? That way, a 30 day month has an equal number of days in each semi-month?

(I'm ignoring February - but maybe that should be a day earlier...)

@jreback
Copy link
Contributor

jreback commented Jun 1, 2016

I think this is pretty standard to do semi-montly on 1/15th. @adrienemery thoughts?

@adrienemery
Copy link
Contributor Author

adrienemery commented Jun 1, 2016

@MaximilianR @jreback I've seen both personally but how about we make it flexible such that the 15th becomes a parameter. That way people can use it for their application as needed. I like keeping 15th as the default since it provides a nice default symmetry.

See latest commit for proposed change - I haven't added tests yet, but if we agree to go this route I will, otherwise will rollback that change.

I did test it in a notebook and it works nicely:

screenshot 2016-06-01 13 45 42

@adrienemery
Copy link
Contributor Author

adrienemery commented Jun 1, 2016

I could also update _from_name such that it allows setting pivot day such as SMS-16

    @classmethod
    def _from_name(cls, suffix=None):
        if not suffix:
            pivot = 15
        else:
            pivot = int(suffix)
        return cls(pivot=pivot)

screenshot 2016-06-01 13 54 34

@jreback
Copy link
Contributor

jreback commented Jun 1, 2016

yeah that's prob reasonable though need to add routines to parse SMS-16 then

look at the last week freq to see what the parameters are named (eg not pivot)

@adrienemery
Copy link
Contributor Author

look at the last week freq to see what the parameters are named (eg not pivot)

I don't understand what you mean by this.

@jreback
Copy link
Contributor

jreback commented Jun 2, 2016

you are thinking of allowing SMS-16, so needs to parse this

In [2]: pd.tseries.frequencies.to_offset('1MS')
Out[2]: <MonthBegin>

In [3]: pd.tseries.frequencies.to_offset('2M')
Out[3]: <2 * MonthEnds>

@adrienemery adrienemery force-pushed the semi-monthly-offset branch 2 times, most recently from ff0a9c6 to 0f42c73 Compare June 7, 2016 06:17
@adrienemery
Copy link
Contributor Author

  • added .apply_index methods
  • added flexible day_of_month parameter that defaults to the 15th
  • added support to parse day_of_month from frequency string

I'd like to add a few more tests for .apply_index but wanted to get feedback on current change in case there are any issues/changes to be made.

Updated asv benchmark output with .apply_index included:

[  0.01%] ··· Running timeseries.timeseries_semi_month_offset.time_semi_month_begin_apply           58.91μs
[  0.01%] ··· Running timeseries.timeseries_semi_month_offset.time_semi_month_begin_apply_index     49.61ms
[  0.01%] ··· Running timeseries.timeseries_semi_month_offset.time_semi_month_begin_decr            68.74μs
[  0.01%] ··· Running timeseries.timeseries_semi_month_offset.time_semi_month_begin_decr_n          80.07μs
[  0.01%] ··· Running timeseries.timeseries_semi_month_offset.time_semi_month_begin_decr_rng        70.12ms
[  0.01%] ··· Running timeseries.timeseries_semi_month_offset.time_semi_month_begin_incr            75.89μs
[  0.01%] ··· Running timeseries.timeseries_semi_month_offset.time_semi_month_begin_incr_n          73.19μs
[  0.01%] ··· Running timeseries.timeseries_semi_month_offset.time_semi_month_begin_incr_rng        56.28ms
[  0.01%] ··· Running timeseries.timeseries_semi_month_offset.time_semi_month_end_apply             73.79μs
[  0.01%] ··· Running timeseries.timeseries_semi_month_offset.time_semi_month_end_apply_index       57.93ms
[  0.01%] ··· Running timeseries.timeseries_semi_month_offset.time_semi_month_end_decr              77.41μs
[  0.01%] ··· Running timeseries.timeseries_semi_month_offset.time_semi_month_end_decr_n            83.09μs
[  0.01%] ··· Running timeseries.timeseries_semi_month_offset.time_semi_month_end_decr_rng          56.51ms
[  0.01%] ··· Running timeseries.timeseries_semi_month_offset.time_semi_month_end_incr              66.83μs
[  0.01%] ··· Running timeseries.timeseries_semi_month_offset.time_semi_month_end_incr_n            70.97μs
[  0.01%] ··· Running timeseries.timeseries_semi_month_offset.time_semi_month_end_incr_rng          55.86ms

@adrienemery adrienemery force-pushed the semi-monthly-offset branch from 0f42c73 to b453149 Compare June 8, 2016 06:10
Pandas has gained new frequency offsets, ``SemiMonthEnd`` ('SM') and ``SemiMonthBegin`` ('SMS').
These provide date offsets anchored (by default) to the 15th and end of month, and 15th and 1st of month respectively.
Optionally you can specify the ``day_of_month`` which will be used instead of the 15th each month. Anchoring suffixes
are also supported for setting the `day_of_month``. Note that ``day_of_month`` must be 1<=day_of_month<=27 for
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you don't need this last sentence (the Note). That can be part of the doc-string.

def _apply_index(self, i, before_day_of_month, after_day_of_month):
n = self.n
time = i.to_perioddelta('D')
is_month_end = tslib.get_start_end_field(i.asi8, 'is_month_end')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not a big deal, but why not just i.is_month_end

@chris-b1
Copy link
Contributor

chris-b1 commented Jun 8, 2016

few comments, looks good overall

@adrienemery adrienemery force-pushed the semi-monthly-offset branch 2 times, most recently from bec3e2a to cc6c61a Compare June 11, 2016 05:13
@adrienemery
Copy link
Contributor Author

I have addressed comments cc/ @jreback @chris-b1

@chris-b1
Copy link
Contributor

Looks good to me. If you ever hit the wall on vectorized performance, it probably wouldn't be hard to modify or clone this cython helper to handle your case, but definitely not necessary, what you're doing is pretty quick already.

assert (result == expected)

freqstr = '2SM-16'
result = frequencies.to_offset(freqstr)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add some tests where this fails (e.g. it should fail)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add SM-15 as well (which should work)

@adrienemery adrienemery force-pushed the semi-monthly-offset branch from cc6c61a to a4bbec2 Compare June 12, 2016 00:12
@adrienemery
Copy link
Contributor Author

  • updated whatsnew
  • added more tests (including failing cases)


pd.date_range('2015-01-01', freq='SMS', periods=4)

Using the anchoring suffix:
Copy link
Contributor

@jreback jreback Jun 13, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using the anchoring suffix, you can also specify the day of month for the semi portion (maybe not right word)

@jreback
Copy link
Contributor

jreback commented Jun 13, 2016

@adrienemery looks real good. just some minor comments. ping when pushed and green.

@adrienemery adrienemery force-pushed the semi-monthly-offset branch 3 times, most recently from 03eb8db to 91573dc Compare June 14, 2016 07:08
@adrienemery adrienemery force-pushed the semi-monthly-offset branch from 91573dc to fe221b2 Compare June 14, 2016 15:48
@adrienemery
Copy link
Contributor Author

@jreback should be good to go - I have addressed comments and build is green

@jreback jreback closed this in bd66592 Jun 14, 2016
@jreback
Copy link
Contributor

jreback commented Jun 14, 2016

@adrienemery thanks! nice PR. these have a lot of pieces for a full patch. Nice job!

give a check on the built docs (prob take a few hours), and if needs corrections, pls issue a followup PR.

@adrienemery
Copy link
Contributor Author

adrienemery commented Jun 14, 2016

@jreback thanks for your help on this PR - especially as this is my first time contributing to pandas. Also thanks for all your work on pandas 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support for bimonthly/weekly timerules
6 participants