Skip to content

No WeekOfMonth Defined for Weekends #5115

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
cancan101 opened this issue Oct 5, 2013 · 15 comments · Fixed by #14853
Closed

No WeekOfMonth Defined for Weekends #5115

cancan101 opened this issue Oct 5, 2013 · 15 comments · Fixed by #14853
Labels

Comments

@cancan101
Copy link
Contributor

In [35]: pd.date_range("2013-1-1", periods=4,freq='WOM-1SAT')
----> 1 pd.date_range("2013-1-1", periods=4,freq='WOM-1SAT')

/home/alex/git/pandas/pandas/tseries/index.pyc in date_range(start, end, periods, freq, tz, normalize, name)
   1724     """
   1725     return DatetimeIndex(start=start, end=end, periods=periods,
-> 1726                          freq=freq, tz=tz, normalize=normalize, name=name)
   1727 
   1728 

/home/alex/git/pandas/pandas/tseries/index.pyc in __new__(cls, data, freq, start, end, periods, copy, name, tz, verify_integrity, normalize, **kwds)
    158             # if a passed freq is None, don't infer automatically
    159             if freq != 'infer':
--> 160                 freq = to_offset(freq)
    161             else:
    162                 freq_infer = True

/home/alex/git/pandas/pandas/tseries/frequencies.pyc in to_offset(freqstr)
    463                     delta = delta + offset
    464         except Exception:
--> 465             raise ValueError("Could not evaluate %s" % freqstr)
    466 
    467     if delta is None:

ValueError: Could not evaluate WOM-1SAT

I will fix as part of #5004

@cancan101
Copy link
Contributor Author

@wesm Why is WeekOfMonth not allowed for weekends?

I found this code as well in frequencies:

    def _get_wom_rule(self):
        wdiffs = unique(np.diff(self.index.week))
        if not lib.ismember(wdiffs, set([4, 5])).all():
            return None

@wesm
Copy link
Member

wesm commented Oct 10, 2013

Seems like for no good reason that I can think of. I'd be in favor of allowing it

@cancan101
Copy link
Contributor Author

So the code now for inferring week of month is pretty broken. It does not work if the series spans a year boundary.

range = pd.date_range('12/1/2000',periods=7,freq='WOM-1MON')
fi = pd.tseries.frequencies._FrequencyInferer(range)
fi._get_wom_rule() # yields None

this is because (see code snippet above):

In [33]: pd.unique(np.diff(fi.index.week))
Out[33]: array([-48,   5,   4])

There are actually two bugs in this check: if not lib.ismember(wdiffs, set([4, 5])).all()

  1. The issue spanning a year
  2. That coupled with week = (self.index[0].day - 1) // 7 + 1 does not really check that the week of month stays the same

@cancan101
Copy link
Contributor Author

CC @jtratner

@jtratner
Copy link
Contributor

not sure how sensitive infer_freq should be. @wesm?

@cancan101
Copy link
Contributor Author

  1. Can be solved by adding array([-47, -49, -48]) to 4 and 5 as possible week diffs.
  2. Can be solved by adding an additional check

For 1.:

In [45]: pd.unique([np.diff(pd.date_range('12/1/%d'%x,periods=2,freq='WOM-1MON').week)[0] for x in xrange(1900,2010)])
Out[45]: array([-47, -49, -48])

@jtratner
Copy link
Contributor

can you check how that affects the performance of infer freq? I think there are some benchmarks in vbench too. If it's minimal then probably useful addition.

@cancan101
Copy link
Contributor Author

I am thinking about the performance issue now. The fix for 1 should have minimal impact, but I will check. I have to think about 2.

@cancan101
Copy link
Contributor Author

In fact the entire set based check (even with the additional three diffs) can be replaced with:

len(pd.unique((self.index.day-1) //7)) == 1

@cancan101
Copy link
Contributor Author

Which also avoids having to then perform: week = (self.index[0].day - 1) // 7 + 1

@cancan101
Copy link
Contributor Author

An example of the correctness issue:

In [61]: index = pd.DatetimeIndex(["2013-08-27","2013-10-01","2013-10-29","2013-11-26"])

In [63]: pd.infer_freq(index)
Out[63]: 'WOM-4TUE'

which is clearly incorrect given that two of the dates are in October.

@cancan101
Copy link
Contributor Author

The new code might actually be faster (simplistic test):

weeks = np.random.randint(1,52,1000)
days = np.random.randint(1,365,1000)

In [74]: %timeit lib.ismember(pd.unique(np.diff(weeks)), set([4, 5, -47, -49, -48])).all()
10000 loops, best of 3: 77.2 us per loop

In [77]: %timeit lib.ismember(pd.unique(np.diff(weeks)), set([4, 5])).all()
10000 loops, best of 3: 75.1 us per loop

In [75]: %timeit pd.unique((days - 1) // 7)
10000 loops, best of 3: 43.6 us per loop

@jtratner
Copy link
Contributor

Seems promising! Can you run the vbench suite?

@cancan101
Copy link
Contributor Author

timeseries_infer_freq                        |   8.3377 |   8.3656 |   0.9967 |

and some other results:

timeseries_to_datetime_iso8601               |   4.3573 |   4.1096 |   1.0603 |
ctor_index_array_string                      |   0.0157 |   0.0147 |   1.0649 |
timeseries_period_downsample_mean            |  14.9250 |  13.9956 |   1.0664 |
datetime_index_union                         |   0.0587 |   0.0547 |   1.0727 |
stat_ops_series_std                          |   0.2070 |   0.1920 |   1.0782 |
stat_ops_level_frame_sum                     |   3.3370 |   3.0920 |   1.0792 |
timeseries_timestamp_tzinfo_cons             |   0.0123 |   0.0110 |   1.1151 |
packers_write_pickle                         |   3.7453 |   3.2587 |   1.1493 |
lib_fast_zip_fillna                          |  12.4346 |  10.4724 |   1.1874 |
packers_write_pack                           |   6.2989 |   4.3293 |   1.4550 |

@jreback
Copy link
Contributor

jreback commented Oct 13, 2013

this looks fine as far as vbenching

@jreback jreback modified the milestones: 0.15.0, 0.14.0 Feb 18, 2014
@jreback jreback modified the milestones: 0.16.0, Next Major Release Mar 3, 2015
@jorisvandenbossche jorisvandenbossche modified the milestones: No action, Next Major Release Dec 11, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants