Skip to content

DatetimeIndex::inferred_freq is misleading #5082

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
cancan101 opened this issue Oct 2, 2013 · 53 comments
Open

DatetimeIndex::inferred_freq is misleading #5082

cancan101 opened this issue Oct 2, 2013 · 53 comments
Labels

Comments

@cancan101
Copy link
Contributor

Looking at these three methods on DatetimeIndex:

    # alias to offset
    @property
    def freq(self):
        return self.offset

    @cache_readonly
    def inferred_freq(self):
        try:
            return infer_freq(self)
        except ValueError:
            return None

    @property
    def freqstr(self):
        return self.offset.freqstr

you would think that freqstr returns a string and inferred_freq and freq return "frequencies" (i.e. offsets). In reality inferred_freq is a string as well.

I am not sure of the best way to deal with this issue. At the very least I would suggest adding a inferred_freq_offset property. Perhaps also deprecate inferred_freq and create a inferred_freqstr.

Tracing this method, you find _FrequencyInferer::get_freq which returns a string rather than an Offset.

@cpcloud
Copy link
Member

cpcloud commented Oct 2, 2013

This is used a bit throughout the codebase (though not too much, grin shows 32 uses). Deprecation of inferred_freq is out of the question since the attribute is exposed publicly, and has been around since v0.8.0. I understand where you're coming from, but there's just too much legacy code that deprecating this could potentially break. Out of curiosity, do you find yourself using this property very often?

@cancan101
Copy link
Contributor Author

@cpcloud Yeah, I did not think deprecation was realistic. I am more interested in just adding inferred_freq_offset.

I am looking to move away from passing around strings to represent DateOffsets and Periods and actually pass the objects themselves. This avoids having to keep performing "silly" map lookups (and it reduces a bunch of code, e.g. #5028).

The code that I am looking at that would benefit from inferred_freq_offset is: Series::to_period:

freq = self.index.freqstr or self.index.inferred_freq
new_index = self.index.to_period(freq=freq)

which I would like to change to:

freq = self.index.freq or self.index.inferred_freq_offset

allowing me to pass the freq object (rather than string) down to the next layer.

@nehalecky
Copy link
Contributor

+1 on inferred_freq_offest.

@jtratner
Copy link
Contributor

jtratner commented Oct 6, 2013

@cancan101 why not:

freq = getattr(self.index, 'offset', self.index.freq)

@cancan101
Copy link
Contributor Author

@jtratner how does that help with inferred freq?

@jtratner
Copy link
Contributor

jtratner commented Oct 6, 2013

@cancan101 sorry, I misread what you were trying to do. If you are going to change the internals to avoid the lookups for date ranges, why not just have inferred_freq return DateOffset?

@jtratner
Copy link
Contributor

jtratner commented Oct 6, 2013

And then in the equality method for DateOffset, just let it return True if it matches its own rule code.

@cpcloud
Copy link
Member

cpcloud commented Oct 6, 2013

hmmm ... inferred_freq is sort of public-ish ....

@jtratner
Copy link
Contributor

jtratner commented Oct 6, 2013

Yeah, but unless you think people are trying to do string concatenation/multiplication with it, having __str__ and __unicode__ return the rule code and __eq__ and __req__ return True if rule code matches makes it relatively transparent.

@cancan101
Copy link
Contributor Author

@jtratner

Two issues:

  1. What @cpcloud said
  2. If I change eq I would have to change hash etc as well. Not end of world ,just have to think of implications

@jtratner
Copy link
Contributor

jtratner commented Oct 6, 2013

@cancan101 well, if freq is exactly as expressive as offset, then you can just have the hash be the rule code right?

@jtratner
Copy link
Contributor

jtratner commented Oct 6, 2013

and by rule code I mean what you pass to period index/datetime index for freq

@cancan101
Copy link
Contributor Author

No. There is many to one mapping string to offset.

@jtratner
Copy link
Contributor

jtratner commented Oct 6, 2013

@cancan101 well, you could pick one mapping to be the actual representation, hash however you want (doesn't really matter so long as it's consistent) and then have __eq__ look through all the aliases too.

@jtratner
Copy link
Contributor

jtratner commented Oct 6, 2013

As an aside, __hash__ doesn't matter that much, because python still does an equality check on lookups after looking up via hash.

@cancan101
Copy link
Contributor Author

No.

object.hash(self)
Called by built-in function hash() and for operations on members of hashed collections including set, frozenset, and dict. hash() should return an integer. The only required property is that objects which compare equal have the same hash value; it is advised to somehow mix together (e.g. using exclusive or) the hash values for the components of the object that also play a part in comparison of objects.

@cancan101
Copy link
Contributor Author

If a == b then hash(a) must == hash(b)

@jtratner
Copy link
Contributor

jtratner commented Oct 6, 2013

Yep, but if a!=b, it does not have to be the case that hash(a) != hash(b)

@cancan101
Copy link
Contributor Author

Think of bucketing in dict. If hash is different then they likely will be in different bins so eq won't get called.

@cancan101
Copy link
Contributor Author

Correct.

@cancan101
Copy link
Contributor Author

We don't control hash of strings so they won't be same.

@cancan101
Copy link
Contributor Author

That being said do we need to make offset eq corresponding rule code?

@jtratner
Copy link
Contributor

jtratner commented Oct 6, 2013

Do you think people rely on inferred_freq being a string so they can manipulate and modify it? If not, what's the functional difference?

@cancan101
Copy link
Contributor Author

I don't know how people are using the value. It is a public property so I was just going to introduce a new property that returned the actual offset to avoid breaking people in whatever manner they use the string.

@cancan101
Copy link
Contributor Author

The old field could eventually be deprecated if desired as the offset is richer and you can get to the string from the offset.

@jtratner
Copy link
Contributor

jtratner commented Oct 6, 2013

it'd be great if DateOffset was just a subclass of str, would make this very simple.

@jtratner
Copy link
Contributor

jtratner commented Oct 6, 2013

anyways, why don't you make up a PR that adds inferred_freq_offset along with whatever simplifications you think can be made internally - easier to think about in context, right? Sounds like you have a plan in mind - and not that big of a deal to add another property I guess.

@jreback
Copy link
Contributor

jreback commented Oct 7, 2013

@cancan101

inferred_freq is a lazy attribute, it is only computed on demand, so a replacement/new one should have the same semantics. It is generally non-trivial (meant time-consuming) to ALWAYS compute this

@patricktokeeffe
Copy link
Contributor

Don't forget pd.infer_freq - it returns a string too. So at the moment there is:

pd.infer_freq ----> str
pd.DatetimeIndex.freq ----> DateOffet subclass
pd.DatetimeIndex.freqstr ----> str
pd.DatetimeIndex.inferred_freq ----> str

And it's being proposed to add:

pd.DatetimeIndex.inferred_freq_offset ----> DateOffset subclass

I would also like an inferred frequency offset class to avoid working with strings but I think the proposed new property name will exacerbate the inconsistency. The best case scenario IMO is:

pd.infer_freq --> DateOffset subclass [CHANGE return type]
pd.infer_freqstr ----> str [NEW]
pd.DatetimeIndex.freq ----> DateOffet subclass
pd.DatetimeIndex.freqstr ----> str
pd.DatetimeIndex.inferred_freq ----> DateOffset subclass [CHANGE return type]
pd.DatetimeIndex.inferred_freqstr ----> str [NEW]

Given how public these are, I realize this is probably a pipe dream. I think it's how it should be done though.


As an aside, it would be useful if DatetimeIndex inherited infer_freq so that df.index.infer_freq() would be equivalent to df.index.freq = to_offset(df.index.inferred_freq). (No, the property can't be set this way; whatever the internal equivalent is.)

@cancan101
Copy link
Contributor Author

As for pipe dream, see the comment above: #5082 (comment)

@jreback
Copy link
Contributor

jreback commented Apr 3, 2014

@cancan101

I think we should prepare for the API change the @patricktokeeffe suggests above

pd.infer_freq --> DateOffset subclass [CHANGE return type]
pd.infer_freqstr ----> str [NEW]
pd.DatetimeIndex.freq ----> DateOffet subclass
pd.DatetimeIndex.freqstr ----> str 
pd.DatetimeIndex.inferred_freq ----> DateOffset subclass [CHANGE return type]
pd.DatetimeIndex.inferred_freqstr ----> str [NEW]

I think this is an easy API change rather than using inferred_freq_offset.

git [goat-jreback-~/pandas] git diff
diff --git a/pandas/tseries/frequencies.py b/pandas/tseries/frequencies.py
index 6eea2fb..e5b9b48 100644
--- a/pandas/tseries/frequencies.py
+++ b/pandas/tseries/frequencies.py
@@ -696,7 +696,7 @@ def infer_freq(index, warn=True):

     index = pd.DatetimeIndex(index)
     inferer = _FrequencyInferer(index, warn=warn)
-    return inferer.get_freq()
+    return to_offset(inferer.get_freq())

 _ONE_MICRO = long(1000)
 _ONE_MILLI = _ONE_MICRO * 1000

@cancan101
Copy link
Contributor Author

How about we make this API change in a separate PR? Since it involves some breaking API changes, I do not want to hold up this PR while we discuss the other changes. FWIW, I do think the API changes make sense.

@jreback
Copy link
Contributor

jreback commented Apr 3, 2014

simply to make this (and remove inferred_freq_offset).

I would have loved for you to separate all of these issues into separate PR's

can you do that?

@cancan101
Copy link
Contributor Author

What is the "that"? Making the API change? Yes. Splitting up this PR, I would prefer not.

@jreback
Copy link
Contributor

jreback commented Apr 3, 2014

'that' means separating all of these issues to separate (or a smaller number of PRs). The problem is that you are doing bug fixes, enhancements, cleaning, and API changes all in one. Very hard to review.

that said its ok; except for the inferred_freq_offset change which is just adding to the API when its simpler to just change inferred_freq.

I haven't found anything that breaks, so I don't see an issue.

@cancan101
Copy link
Contributor Author

Okay I will make the API change in this PR but leave the commit separate to
at least make reviewing that easier.

On Thu, Apr 3, 2014 at 6:48 PM, jreback [email protected] wrote:

'that' means separating all of these issues to separate (or a smaller
number of PRs). The problem is that you are doing bug fixes, enhancements,
cleaning, and API changes all in one. Very hard to review.

that said its ok; except for the inferred_freq_offset change which is
just adding to the API when its simpler to just change inferred_freq.

I haven't found anything that breaks, so I don't see an issue.

Reply to this email directly or view it on GitHubhttps://github.com//issues/5082#issuecomment-39514517
.

@jreback
Copy link
Contributor

jreback commented Apr 3, 2014

ok grt (and I think a inferred_freqstr is appropriate as well for consistency.

@cancan101
Copy link
Contributor Author

@jreback Are you familiar with the difference between get_offset and to_offset? Which should be the one used in the diff posted above (for inferred_freq_offset)? You used to but I had used get: https://github.com/pydata/pandas/pull/5148/files#diff-23ecb29e7ceba52109a365e447400d2eR1436

@jreback
Copy link
Contributor

jreback commented Apr 4, 2014

get_offset it really an internal function only, use to_offset (get won't correctly handle 5D, only D) for example (to_offset calls it)

@jreback jreback modified the milestones: 0.14.1, 0.14.0 Apr 27, 2014
@jreback jreback modified the milestones: 0.15.0, 0.14.1 Jun 10, 2014
@jreback jreback modified the milestones: 0.15.0, 0.15.1 Jul 6, 2014
@jreback jreback modified the milestones: 0.15.0, 0.15.1 Sep 8, 2014
@jreback jreback modified the milestones: 0.16.0, Next Major Release Mar 6, 2015
@mroeschke mroeschke removed this from the Contributions Welcome milestone Oct 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants