Skip to content

fix #39556 (infer_freq not working with freq="H" and DST #39644

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Feb 23, 2021

Conversation

sdementen
Copy link
Contributor

@sdementen sdementen commented Feb 7, 2021

  • check that the delta are unique before checking if they are day multiples

  • add test with freq="H" that raises the bug

  • closes BUG: 'infer_freq' does not work with tz != "UTC" #39556

  • tests added / passed

  • Ensure all linting tests pass, see here for how to run them

  • whatsnew entry => not sure where to enter this ...

- check that the delta are unique before checking if the are day multiples
- add test with freq="H" that raises the bug
…taking for delta the minimum of deltas and checking delta is not null
Copy link
Member

@arw2019 arw2019 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @sdementen for the PR!

we'll need a whatsnew (I expect targeting 1.3)

@arw2019 arw2019 added Datetime Datetime data dtype Timezones Timezone data dtype labels Feb 7, 2021
@@ -239,17 +239,18 @@ def get_freq(self) -> Optional[str]:
if not self.is_monotonic or not self.index._is_unique:
return None

delta = self.deltas[0]
if _is_multiple(delta, _ONE_DAY):
delta = min(self.deltas)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think the idea behind using deltas[0] was that deltas should be unique at this point. is that not the case?

Copy link
Contributor Author

@sdementen sdementen Feb 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not unique when the index has a business day frequency as you have deltas of 1 day or 3 days (for the weekend).
A first version of the bugfix I tried was to check first the unicity and then take deltas[0] to fix the issue with freq=H and DST but it broke the test for business days

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

after reading through the doc of unique_deltas I see the self.deltas area already sorted => no need to take the min.
the key is to check that delta !=0 as with freq="H" and tz with DST, the minimum delta is 0 (in local time)

@sdementen
Copy link
Contributor Author

I have added the whatsnew entry

@sdementen
Copy link
Contributor Author

There is some check that fails after having added the what's new entry. Any clue why?

@lithomas1
Copy link
Member

@sdementen The CI failures are due to #39688. They should be gone if you merge master.

@jbrockmendel
Copy link
Member

cc @mroeschke i think this is an hour-based analogue of the problem of freq=Day vs freq=DayDST.

i.e. this fixes one problem but will introduce others. im hesitant to do that, but the long-term fix has been stuck in limbo for a while

@sdementen
Copy link
Contributor Author

@sdementen The CI failures are due to #39688. They should be gone if you merge master.

Do I need to do this myself? Or will this automatically be solved once someone merge my PR into master?

@arw2019
Copy link
Member

arw2019 commented Feb 10, 2021

@sdementen The CI failures are due to #39688. They should be gone if you merge master.

Do I need to do this myself? Or will this automatically be solved once someone merge my PR into master?

you wanna merge master yourself yeah

@mroeschke
Copy link
Member

I haven't used infer_freq much, but if infer_freq = "infer the wall time frequency" then yes this the same issue as freq=Day vs freq=DayDST.

@sdementen
Copy link
Contributor Author

merged with master, all tests pass, ready to merge ;-)

@sdementen
Copy link
Contributor Author

@arw2019 @mroeschke , do I need to do something more re this PR for it to be merged ?

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can u also add an explicit test that is similar to the OP

@sdementen
Copy link
Contributor Author

can u also add an explicit test that is similar to the OP

The test test_infer_freq_tz_transition is similar to my OP in the issue. The bug was not revealing itself because the frequency "H" was not used (only "3H") and it is this frequency "H" that I uses in my OP.

@@ -267,7 +267,7 @@ def test_infer_freq_tz(tz_naive_fixture, expected, dates):
],
)
@pytest.mark.parametrize(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not exactly the same as the OP (though it may have revelead the issue). this is a naive fixture (IOW the OP worked for naive & UTC) ,but NOT for other tzs.

so can also make this more comprehensive

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure to get your comment yet ...

When I run the test test_infer_freq_tz_transition, it runs for a lot of tzs (None, UTC, US/Eastern, Asia/Tokyo, ...), for date_pairs that cover the DST changes (Fall, Spring and no change) and for freq = "H" (a.o. as it also tests for other infra-day frequencies). The test also refers to #8772 which is the issue I rephrased with a simple example in #39556.

My OP was only one case (tz=None, UTC, CET and freq=H) amongst these cases.
What was misleading in the original test is that the base frequency "H" that triggers the issue was not covered (probably because the author thought that testing with "3H" would cover "H" + other cases.

I can add a new test but I do not see what would differ from the current one (just my date_pairs would cover a full year which is not really needed for the test and my tz would be "CET" that is not covered yet other tz with DST are covered).

The small change in the test (adding freq="H" to the frequencies to test) breaks pandas before the bugfix.

I could adapt the current comment from " # see gh-8772" to " # see gh-8772 and gh-39556" to make it clearer ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry you are right, was misreading the fixture.

@jreback jreback added this to the 1.3 milestone Feb 22, 2021
@jreback
Copy link
Contributor

jreback commented Feb 22, 2021

cc @mroeschke if any comments, merge when good

@mroeschke mroeschke merged commit 3289f82 into pandas-dev:master Feb 23, 2021
@mroeschke
Copy link
Member

Thanks @sdementen

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Datetime Datetime data dtype Timezones Timezone data dtype
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: 'infer_freq' does not work with tz != "UTC"
7 participants