Skip to content

ENH: Have inferred_freq set offset. Have freq always try to infer. #6637

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from

Conversation

jseabold
Copy link
Contributor

I'm on the fence about this. Having a property with side effects doesn't quite sit right to me. But I think it kinda makes sense here. If it didn't set freq, then it really doesn't make sense for freq to exist. I'd just use inferred_freq all of the time.

Now calling freq always tries to infer the frequency. Why not? Seems sensible to me. inferred_freq is kind of redundant now, but I left it anyway. I was never really sure of the rationale for its existence.

@jreback
Copy link
Contributor

jreback commented Mar 14, 2014

AFAICT this was not done for 2 reasons:

  • performance, unless you are doing a time-series operation that needs to freq no need to 'compute' it (if it is not already specified)
  • their is not an easy way to represent 'no freq' except by None. So you can have a freq even though it really shouldn't (I have seen this when you have 5 mon-fri days, it get's inferred to a business day freq), and thus the inferernce can be a tad ambiguous (see above)

maybe @wesm has a different view

another possible reason (and the reason the tests fail) is that merge/join doesn't want this and can lead to spurios errors, e.g. joining a single day that happens to be a month-end to a 'daily' frequency, is allowed but if you already have frequency assigned then it will fail.

@jseabold
Copy link
Contributor Author

I see your points, though the join thing seems more like a corner case implementation detail.

Ok, so pretty much I should just always used inferred_freq in statsmodels and take the possible (one-time) performance hit. The reason for this is that freq is only defined if it's given when the Index is created, but even then you can't count on freq being defined because freq gets stepped on when you're resetting indices, etc. #6631.

This just seems so esoteric from a user standpoint. I.e., freq == user_defined_freq unless it's been stepped on, and infer_freq == always gives me something if it can. It seems like 98% of the use-cases want the latter. I'm biased to data that almost always has a sane frequency.

@jreback
Copy link
Contributor

jreback commented Mar 14, 2014

I think the reset case is really the issue. Say you reset, then set the index, freq is lost (and must be re-inferred), because the meta-data is not 'kept' anywhere. I suppose you could have a flag in DatetimeIndex that would 'force' inference to happen. Might make sense for statsmodels.

Actually the reset/set case could be dealt with by caching the indexes on a reset/set. hmm.

@jseabold
Copy link
Contributor Author

Right now I'm just leaning towards always using infer_freq (lazily) and the status quo. All the dates magic we can do requires some kind of frequency (forecast, backcasting, re-sampling, etc.). Thinking about it, dates really aren't that useful to have around without frequencies (from where I sit). I just have to remember, because it's not all that intuitive. If infer_freq is your performance bottleneck, then you should likely be closer to the metal anyway. Plus the vast majority of the time users load in data rather than creating a DatetimeIndex, passing in a freq. It's best if users don't have to think about frequencies and periodicities at all.

@jseabold jseabold closed this Mar 14, 2014
@jseabold jseabold deleted the infer-freq-set-offset branch March 14, 2014 19:38
@jreback
Copy link
Contributor

jreback commented Mar 14, 2014

I don't think you will have an issue as long as you don't reset/set (or pay the penality if you do). The more I think about it the more that's the reason it doesn't do this automatically. reset/set are cheap operations and you want them to stay that way.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Frequency DateOffsets Performance Memory or execution speed performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants