-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
TST: check freq on series.index in assert_series_equal #33815
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
thanks |
This is a backwards incompatible change, that will break downstream projects' tests. Only there tests of course, not actual users, but still potentially annoying (this started breaking the arrow tests we run against pandas master). Are we sure we actually want to test the |
I don't have a strong opinion here, but do we think that these should compare equal? In [14]: a = pd.DatetimeIndex(['2000'])
In [15]: b = pd.DatetimeIndex(['2000'], freq='D') They have the same values but different In general, I think our test functions should be strict by default with keywords allow looser comparisons. So I'm probably OK with those not asserting equal. |
It maybe depends a bit on how you see the "freq". Is it an attribute that actually changes how we interpret the data, or is it rather like a cached property for which we do more effort to preserve it (since it can expensive to infer each time we need it) ? In the first case, it certainly makes sense to check it by default, but in the second interpretation, not so much IMO. (now, which interpretation is most correct, I don't directly know, I am not that familiar with how it is used across pandas) |
Yeah, I think you've accurately summarized the situation. I think, though, that whether If we think it's just a cached attribute then I'm happy to exclude it from the assertions (by default). If it's an intrinsic property, then I think the default should be to raise if it doesn't match. |
@jbrockmendel @jreback some response here? |
my plan is to add a |
And why do you think the default should be True? |
(not that the other PR shouldn't move forward, but why does factorize block adding |
factorize doesnt block per se, but im trying to keep only one freq-checking-related PR active at a time.
Because failing to check freq allows for bugs to go unsurfed. e.g. the author of this test apparently thought that freq was being checked when it isnt |
Well, when you know you want to preserve the freq and test this, you can pass the keyword when it is not the default. |
checking frequency should be the default, I totally agree with @jbrockmendel if we care about freq then we need to always check, when its None, that's fine as well, but if something is not the default rarely to authors or reviewers catch this (and has been a large source of bugs). relying on 'manual' things is big -1 always. |
But often we don't care about freq. See the many cases in this PR that removed the freq before testing. Therefore I asked the question about whether freq is more like a fancy cached property or not? (#33815 (comment) and to repeat from there, I don't necessarily know the answer, but you also didn't reply to that) And to repeat, this is a backwards incompatible change, that will break the tests of almost any downstream project that uses timeseries. |
Before long we'll roll that check into assert_index_equal