Skip to content

TST: check freq on series.index in assert_series_equal #33815

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Apr 27, 2020

Conversation

jbrockmendel
Copy link
Member

Before long we'll roll that check into assert_index_equal

@jreback jreback added Frequency DateOffsets Testing pandas testing functions or related to the test suite labels Apr 27, 2020
@jreback jreback added this to the 1.1 milestone Apr 27, 2020
@jreback jreback merged commit 51018f6 into pandas-dev:master Apr 27, 2020
@jreback
Copy link
Contributor

jreback commented Apr 27, 2020

thanks

@jbrockmendel jbrockmendel deleted the tst-freq-ser branch April 27, 2020 14:29
@jorisvandenbossche
Copy link
Member

This is a backwards incompatible change, that will break downstream projects' tests. Only there tests of course, not actual users, but still potentially annoying (this started breaking the arrow tests we run against pandas master).

Are we sure we actually want to test the freq by default? It could also be an opt-in keyword, that you can call for those tests where you explicitly want to test this.

@TomAugspurger
Copy link
Contributor

I don't have a strong opinion here, but do we think that these should compare equal?

In [14]: a = pd.DatetimeIndex(['2000'])

In [15]: b = pd.DatetimeIndex(['2000'], freq='D')

They have the same values but different freq.

In general, I think our test functions should be strict by default with keywords allow looser comparisons. So I'm probably OK with those not asserting equal.

@jorisvandenbossche
Copy link
Member

It maybe depends a bit on how you see the "freq". Is it an attribute that actually changes how we interpret the data, or is it rather like a cached property for which we do more effort to preserve it (since it can expensive to infer each time we need it) ?

In the first case, it certainly makes sense to check it by default, but in the second interpretation, not so much IMO.

(now, which interpretation is most correct, I don't directly know, I am not that familiar with how it is used across pandas)

@TomAugspurger
Copy link
Contributor

Yeah, I think you've accurately summarized the situation. I think, though, that whether freq is just a cached attribute or an intrinsic property of the Index is a bit fuzzy.

If we think it's just a cached attribute then I'm happy to exclude it from the assertions (by default).

If it's an intrinsic property, then I think the default should be to raise if it doesn't match.

@jorisvandenbossche
Copy link
Member

@jbrockmendel @jreback some response here?

@jbrockmendel
Copy link
Member Author

my plan is to add a check_freq keyword that defaults to True. Moving forward with this is blocked by #33836.

@jorisvandenbossche
Copy link
Member

And why do you think the default should be True?

@jorisvandenbossche
Copy link
Member

(not that the other PR shouldn't move forward, but why does factorize block adding check_freq in assert_index_equal?)

@jbrockmendel
Copy link
Member Author

(not that the other PR shouldn't move forward, but why does factorize block adding check_freq in assert_index_equal?)

factorize doesnt block per se, but im trying to keep only one freq-checking-related PR active at a time.

And why do you think the default should be True?

Because failing to check freq allows for bugs to go unsurfed. e.g. the author of this test apparently thought that freq was being checked when it isnt

@jorisvandenbossche
Copy link
Member

Well, when you know you want to preserve the freq and test this, you can pass the keyword when it is not the default.

@jreback
Copy link
Contributor

jreback commented May 6, 2020

Well, when you know you want to preserve the freq and test this, you can pass the keyword when it is not the default.

checking frequency should be the default, I totally agree with @jbrockmendel if we care about freq then we need to always check, when its None, that's fine as well, but if something is not the default rarely to authors or reviewers catch this (and has been a large source of bugs).

relying on 'manual' things is big -1 always.

@jorisvandenbossche
Copy link
Member

if we care about freq then we need to always check

But often we don't care about freq. See the many cases in this PR that removed the freq before testing.

Therefore I asked the question about whether freq is more like a fancy cached property or not? (#33815 (comment) and to repeat from there, I don't necessarily know the answer, but you also didn't reply to that)

And to repeat, this is a backwards incompatible change, that will break the tests of almost any downstream project that uses timeseries.
And in this PR it is actually being solved with a private API.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Frequency DateOffsets Testing pandas testing functions or related to the test suite
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants