Skip to content

ENH: Support for "52–53-week fiscal year" / "4–4–5 calendar" #4511

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
cancan101 opened this issue Aug 8, 2013 · 28 comments · Fixed by #5004
Closed

ENH: Support for "52–53-week fiscal year" / "4–4–5 calendar" #4511

cancan101 opened this issue Aug 8, 2013 · 28 comments · Fixed by #5004
Labels
Datetime Datetime data dtype Enhancement Frequency DateOffsets
Milestone

Comments

@cancan101
Copy link
Contributor

See: http://en.wikipedia.org/wiki/4%E2%80%934%E2%80%935_calendar about the referenced calendars.

Certain companies, for example Green Mountain Coffee Roasters, Inc. (GMCR) use this calendar. See for example: http://files.shareholder.com/downloads/GMCR/1456137416x0x436353/9E2D04D9-79DE-4C08-A7C6-7B161E23E586/gmcr_2010_annual_report_lo.pdf
where it states:

"The Company’s fiscal year ends on the last Saturday in September."

Here is some more information on this calendar: http://www.nrf.com/modules.php?name=Pages&sp_id=391
Walmart releases its sales numbers on this calendar: http://stock.walmart.com/financial-reporting/comparable-store-sales

@cancan101
Copy link
Contributor Author

Here is an academic paper on these fiscal calendars: http://www3.nd.edu/~carecob/April2010Conference/LeonePaper.pdf

Home Depot (HD) is another company which uses a retail calendar (different from GMCR, however) (http://phx.corporate-ir.net/External.File?item=UGFyZW50SUQ9MTc3ODU4fENoaWxkSUQ9LTF8VHlwZT0z&t=1 ):

The Company’s fiscal year is a 52- or 53-week period ending on the Sunday nearest to January 31. The fiscal year ended February 3, 2013 ("fiscal 2012") includes 53 weeks and fiscal years ended January 29, 2012 ("fiscal 2011") and January 30, 2011 ("fiscal 2010") include 52 weeks.

@jreback
Copy link
Contributor

jreback commented Aug 20, 2013

look at CustomBusinessDay (you prob just need the DateOffset not the index) for an example of how to subclass offsets; essentially you just have it do custom apply

@cancan101
Copy link
Contributor Author

@jreback I posted an SO question about this

@jreback
Copy link
Contributor

jreback commented Aug 20, 2013

I know, much better to ask here :)

@cancan101
Copy link
Contributor Author

Okay, so once I have created a subclass of DateOffset, how do I got about making Periods that are on that frequency?

@jreback
Copy link
Contributor

jreback commented Aug 20, 2013

Create a test for it, then step thru until it fails, then see what else you need to do. I haven't created a new frequency so I don't have a roadmap, but this is how TDD works.

p = Period(ordinal=-1, freq='NEW_FREQ')

@cancan101
Copy link
Contributor Author

@jreback I have found a couple oddities.
This:

pd.Period("2013-12", freq=CustomBusinessDay())

leads to:

Traceback (most recent call last):
  File "/home/alex/git/pandas/pandas/tseries/tests/test_52.py", line 15, in testName
    a = pd.Period("2013-12", freq=CustomBusinessDay())
  File "/home/alex/git/pandas/pandas/tseries/period.py", line 121, in __init__
    base, mult = _gfc(freq)
  File "/home/alex/git/pandas/pandas/tseries/frequencies.py", line 95, in get_freq_code
    code = _period_str_to_code(freqstr[1])
  File "/home/alex/git/pandas/pandas/tseries/frequencies.py", line 742, in _period_str_to_code
    freqstr = _rule_aliases.get(freqstr.lower(), freqstr)
AttributeError: 'int' object has no attribute 'lower'

because in tseries.frequencies, freqstr.n is an int for a DateOffset on which lower called:

    if isinstance(freqstr, DateOffset):
        freqstr = (get_offset_name(freqstr), freqstr.n)

    if isinstance(freqstr, tuple):
        if (com.is_integer(freqstr[0]) and
                com.is_integer(freqstr[1])):
            # e.g., freqstr = (2000, 1)
            return freqstr
        else:
            # e.g., freqstr = ('T', 5)
            try:
                code = _period_str_to_code(freqstr[0])
                stride = freqstr[1]
            except:
                code = _period_str_to_code(freqstr[1])
                stride = freqstr[0]
            return code, stride

The other is this great line:

get_offset_name = get_offset_name

@cancan101
Copy link
Contributor Author

Currently I am getting stuck.

Period(ordinal=-1, freq='NEW_FREQ')

eventually leads to a lookup in the _period_code_map (or the _period_alias_dict), which goes back to my question as to what those codes mean.

@jreback
Copy link
Contributor

jreback commented Aug 22, 2013

freq is a string, you need to actually define new codes that eventually instantiate your class

NEW_FREQ (pick a better name!)

will be recognized as a valid code in the code maps then your class created

@cancan101
Copy link
Contributor Author

Of course it will have a better name.

It actual does not need to be a string. The method should take an offset.
See the comment above. Can I choose any code that is currently unused? What
is the point of the int code as opposed to a reference to an offset? .
On Aug 21, 2013 9:50 PM, "jreback" [email protected] wrote:

freq is a string, you need to actually define new codes that eventually
instantiate your class

NEW_FREQ (pick a better name!)

will be recognized as a valid code in the code maps then your class
created


Reply to this email directly or view it on GitHubhttps://github.com//issues/4511#issuecomment-23063558
.

@jreback
Copy link
Contributor

jreback commented Aug 22, 2013

needs to be a string (well hashable), easiest to keep the current standard

the number have to do with non-conflicting with other periods

periods are ultimately represented by integers, so the reverse translation needs to happen too

@cancan101
Copy link
Contributor Author

Ultimately there has to be a string, but I can pass an offset in for freq
which then can be called to return it's string representation.

That being said, is the reason historic for the int representation rather
than using an object reference as to avoid maintaining two sets of lookup
tables?
On Aug 21, 2013 9:57 PM, "jreback" [email protected] wrote:

needs to be a string (well hashable), easiest to keep the current standard

the number have to do with non-conflicting with other periods

periods are ultimately represented by integers, so the reverse translation
needs to happen too


Reply to this email directly or view it on GitHubhttps://github.com//issues/4511#issuecomment-23063826
.

@jreback
Copy link
Contributor

jreback commented Aug 22, 2013

why would you not use a string like everything else?

In [1]: pr = period_range('2000',periods=5,freq='A')

In [2]: pr
Out[2]: 
<class 'pandas.tseries.period.PeriodIndex'>
freq: A-DEC
[2000, ..., 2004]
length: 5

In [3]: pr.values
Out[3]: array([30, 31, 32, 33, 34])

the lookup table is just used to compute the offsets that govern the backing representation for the period index (e.g. the number that represents a unique period)

so they are not optional; they are stored as numpy arrays; objects don't work here (well they work but you lose all efficiency) and are not a good idea in general

@cancan101
Copy link
Contributor Author

@jreback What I mean about not having to use strings for freq was that either of these work:

pd.Period("2013-12-01", freq=Day())
pd.Period("2013-12-01", freq="D")

They both lead to:

Period('2013-12-01', 'D')

@cancan101
Copy link
Contributor Author

@jreback As for ints, I wasn't referring to the int used to represent a Period, but rather the int to represent a frequency/DateOffset:

get_freq_code(Day())
(6000, 1)

or

get_freq_code(Week(weekday=1))
(4002, 1)

which should need to be stored only once for a given vector.

@jreback
Copy link
Contributor

jreback commented Aug 22, 2013

I just read it; these correspond to the original scikit-timeseries codes. So I guess it IS historic. But for back compat, follow what its doing (and maybe note that; using a code that is available), maybe start at 10000 or something

@cancan101
Copy link
Contributor Author

Any idea what the reason to keep using those codes? If they are only used for internal representations (i.e. the user never sees them, etc), it might make sense to tear them out and replace with a more meaningful object (perhaps the a reference to the Offset). An accessor on that object that provides those codes can still be kept around if needed.

Having to keep two lookup tables up to date looks like it will cause maintenance issues and errors down the road. Further it looks to complicate adding new Offsets. For the time being I will just grab an int and use it, but it might be a good project to migrate away from the legacy ints. That being said, I have only done a precursory examination of how those codes are used.

@jreback
Copy link
Contributor

jreback commented Aug 22, 2013

I haven't played around with this so I don't know

that said, it's easy to try things to break tests (though have to be careful that u break something that not being tested!)

and I suspect we should drop support for scikit-timeseries in any event

@cancan101
Copy link
Contributor Author

Any recommendations for what I should call these new DateOffsets? The two styles that I plan to implement:

  1. Under this method the company's fiscal year is defined as the final Saturday (or other day selected) in the fiscal year end month.
  2. Under this method the company's fiscal year is defined as the Saturday (or other day selected) that falls closest to the last day of the fiscal year end month.

@jreback
Copy link
Contributor

jreback commented Aug 22, 2013

these are 'Annual' right?

@cancan101
Copy link
Contributor Author

Quarters. They are like the QuarterEnd or BQuarterEnd bur rather than the quarter being anchored to the last day in the month specified, they are to the rules I outlined.

@jreback
Copy link
Contributor

jreback commented Aug 22, 2013

how about FQ for Fiscal Quarter?

@cancan101
Copy link
Contributor Author

It isn't just any fiscal quarter, but rather one that tends to be used in retail.

@jreback
Copy link
Contributor

jreback commented Aug 22, 2013

how about FQ52 and FQ4_4_5 ?

@jreback
Copy link
Contributor

jreback commented Aug 22, 2013

not sure if there is a abbrev for 4_4_5 type

@cancan101
Copy link
Contributor Author

In addition to adding the new Offset to the _offset_map in frequencies, I added a new code to the _period_code_map. (I chose 10000 for testing).

Now I am able to get as far as:

            self.ordinal = tslib.period_ordinal(dt.year, dt.month, dt.day,
                                                dt.hour, dt.minute, dt.second,
                                                base)

which dies with:

Traceback (most recent call last):
  File "/home/alex/git/pandas/pandas/tseries/tests/test_52.py", line 22, in testName
    a = pd.Period("2013-12", freq=Calendar5253LastOfMonthQuarterEnd(weekday=1, startingMonth=3))
  File "/home/alex/git/pandas/pandas/tseries/period.py", line 128, in __init__
    base)
  File "tslib.pyx", line 2306, in pandas.tslib.period_ordinal (pandas/tslib.c:34194)
RuntimeError: Unable to generate frequency ordinal

I can trace then to get_period_ordinal in period.c. Before I go off and think about writing some c code, am I on the right track? Is there a better way to add a new frequency than having to rewrite the logic I wrote in python for the Offset again but in c? If there is not, I would be inclined to fall back to slower implementation in python in this case.

@jreback
Copy link
Contributor

jreback commented Aug 23, 2013

I don't think you need to mess with this, the base freq is FR_QTR (which just means that taking an arbitrary date you want to place it in the right quarter), which works for you I believe, this is a very-low level function, you just need to interpret the output of this

@cancan101
Copy link
Contributor Author

Something seems wrong here:

print repr(pd.Period("2013-12", freq=MonthEnd()))

correctly prints:

Period('2013-12', 'M')

but:

print pd.Period("2013-12", freq=BusinessMonthEnd())

leads to:

Traceback (most recent call last):
  File "/home/alex/git/pandas/pandas/tseries/tests/test_52.py", line 29, in testName
    a = pd.Period("2013-12", freq=BusinessMonthEnd())
  File "/home/alex/git/pandas/pandas/tseries/period.py", line 121, in __init__
    base, mult = _gfc(freq)
  File "/home/alex/git/pandas/pandas/tseries/frequencies.py", line 92, in get_freq_code
    code = _period_str_to_code(freqstr[0])
  File "/home/alex/git/pandas/pandas/tseries/frequencies.py", line 757, in _period_str_to_code
    alias = _period_alias_dict[freqstr]
KeyError: 'BM'

Strangely, the following both work:

In [10]: pd.date_range("2013-01","2013-10",freq=MonthEnd())
Out[10]: 
<class 'pandas.tseries.index.DatetimeIndex'>
[2013-01-31 00:00:00, ..., 2013-09-30 00:00:00]
Length: 9, Freq: M, Timezone: None

In [11]: pd.date_range("2013-01","2013-10",freq=BusinessMonthEnd())
Out[11]: 
<class 'pandas.tseries.index.DatetimeIndex'>
[2013-01-31 00:00:00, ..., 2013-09-30 00:00:00]
Length: 9, Freq: BM, Timezone: None

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Datetime Datetime data dtype Enhancement Frequency DateOffsets
Projects
None yet
2 participants