CI: hypothesis occasionally failing #24641

jreback · 2019-01-05T14:25:52Z

have seen this failure occasionally, need to see if this is an actual test case that we are not handling, or its just hypthesis flakiness. Let's fix in any event.

e.g. https://dev.azure.com/pandas-dev/pandas/_build/results?buildId=6564

nclosed file <_io.TextIOWrapper name=0 mode='r' encoding='UTF-8'>
2019-01-05T14:20:09.0701278Z sys:1: ResourceWarning: unclosed <socket.socket fd=10, family=AddressFamily.AF_INET, type=SocketKind.SOCK_DGRAM, proto=0, laddr=('0.0.0.0', 0)>
2019-01-05T14:20:11.0385550Z 
2019-01-05T14:20:11.0388329Z =================================== FAILURES ===================================
2019-01-05T14:20:11.0388803Z ___________________________ test_tick_add_sub[Milli] ___________________________
2019-01-05T14:20:11.0389871Z [gw0] linux -- Python 3.7.2 /home/vsts/miniconda3/envs/pandas-dev/bin/python
2019-01-05T14:20:11.0391214Z 
2019-01-05T14:20:11.0392408Z cls = <class 'pandas.tseries.offsets.Milli'>
2019-01-05T14:20:11.0393419Z 
2019-01-05T14:20:11.0394965Z     @pytest.mark.parametrize('cls', tick_classes)
2019-01-05T14:20:11.0395812Z >   @example(n=2, m=3)
2019-01-05T14:20:11.0396155Z     @example(n=800, m=300)
2019-01-05T14:20:11.0396909Z     @example(n=1000, m=5)
2019-01-05T14:20:11.0397979Z     @given(n=st.integers(-999, 999), m=st.integers(-999, 999))
2019-01-05T14:20:11.0398357Z     def test_tick_add_sub(cls, n, m):
2019-01-05T14:20:11.0398577Z 
2019-01-05T14:20:11.0398798Z pandas/tests/tseries/offsets/test_ticks.py:41: 
2019-01-05T14:20:11.0399043Z _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
2019-01-05T14:20:11.0399633Z ../../../miniconda3/envs/pandas-dev/lib/python3.7/site-packages/hypothesis/core.py:598: in execute
2019-01-05T14:20:11.0399952Z     % (test.__name__, text_repr[0])
2019-01-05T14:20:11.0400221Z _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
2019-01-05T14:20:11.0400423Z 
2019-01-05T14:20:11.0400647Z self = <hypothesis.core.StateForActualGivenExecution object at 0x7f75c25bfc88>
2019-01-05T14:20:11.0402646Z message = 'Hypothesis test_tick_add_sub(cls=Milli, n=-447, m=-513) produces unreliable results: Falsified on the first call but did not on a subsequent one'
2019-01-05T14:20:11.0402767Z 
2019-01-05T14:20:11.0402821Z     def __flaky(self, message):
2019-01-05T14:20:11.0402897Z         if len(self.falsifying_examples) <= 1:
2019-01-05T14:20:11.0402948Z >           raise Flaky(message)
2019-01-05T14:20:11.0403360Z E           hypothesis.errors.Flaky: Hypothesis test_tick_add_sub(cls=Milli, n=-447, m=-513) produces unreliable results: Falsified on the first call but did not on a subsequent one
2019-01-05T14:20:11.0403446Z 
2019-01-05T14:20:11.0403739Z ../../../miniconda3/envs/pandas-dev/lib/python3.7/site-packages/hypothesis/core.py:803: Flaky
</details>

The text was updated successfully, but these errors were encountered:

nixphix · 2019-01-06T01:15:39Z

Flakiness is causes by a long running test case, seems like pandas.tseries.offset.Milli took too long on first test case exceeding time limit and then faction of it for succeeding test run as mentioned in the error message below.

Error from the logs :

Unreliable test timings! On an initial run, this test took 722.20ms, which exceeded
the deadline of 500.00ms, but on a subsequent run it took 0.12 ms, which did not.
If you expect this sort of variability in your test timings, consider turning deadlines
off for this test by setting deadline=None

I wonder why a simple test case like this took that long to run

pandas/pandas/tests/tseries/offsets/test_ticks.py

Lines 40 to 57 in b751953

    
           @pytest.mark.parametrize('cls', tick_classes) 
        
           @example(n=2, m=3) 
        
           @example(n=800, m=300) 
        
           @example(n=1000, m=5) 
        
           @given(n=st.integers(-999, 999), m=st.integers(-999, 999)) 
        
           def test_tick_add_sub(cls, n, m): 
        
               # For all Tick subclasses and all integers n, m, we should have 
        
               # tick(n) + tick(m) == tick(n+m) 
        
               # tick(n) - tick(m) == tick(n-m) 
        
               left = cls(n) 
        
               right = cls(m) 
        
               expected = cls(n + m) 
        
               assert left + right == expected 
        
               assert left.apply(right) == expected 
        
               expected = cls(n - m) 
        
               assert left - right == expected

The 500 ms deadline is recently introduced and there is a instruction on what needs to be done in this scenario here

pandas/pandas/conftest.py

Lines 17 to 27 in b751953

    
           hypothesis.settings.register_profile( 
        
               "ci", 
        
               # Hypothesis timing checks are tuned for scalars by default, so we bump 
        
               # them from 200ms to 500ms per test case as the global default.  If this 
        
               # is too short for a specific test, (a) try to make it faster, and (b) 
        
               # if it really is slow add `@settings(deadline=...)` with a working value, 
        
               # or `deadline=None` to entirely disable timeouts for that test. 
        
               deadline=500, 
        
               timeout=hypothesis.unlimited, 
        
               suppress_health_check=(hypothesis.HealthCheck.too_slow,) 
        
           )

cc @Zac-HD

Zac-HD · 2019-01-06T01:31:44Z

Yep, looks like a pure timing issue. Probably worth investigating why the timing is so variable, but until them you can just add e.g. @hypothesis.settings(deadline=2000) to give it 2k milliseconds per test case.

nixphix · 2019-01-06T02:46:47Z

@jreback If this error only occurs in Azure Pipeline, cause I see Travis jobs passed for the same PR, it could be platform specific and since its occurring sporadically we can give the benefit of the doubt.
shall we extend the deadline for the test case to 2k ms or so?

jreback · 2019-01-06T02:54:12Z

it’s not platform specific and only occasionally fails

Zac-HD · 2019-01-06T03:07:07Z

The other option is @hypothesis.settings(deadline=None) to ensure it literally never happens, as suggested by the error message.

nixphix · 2019-01-06T15:58:20Z

@jreback should I extended the deadline to 2000 ms or disable deadline checking for the test case ?

jreback · 2019-01-06T16:14:55Z

let’s just disable the deadline

jreback added CI Continuous Integration Frequency DateOffsets labels Jan 5, 2019

jreback added this to the Contributions Welcome milestone Jan 5, 2019

nixphix mentioned this issue Jan 6, 2019

TST/CI: disable hypothesis deadline for test_tick_add_sub #24654

Merged

4 tasks

jreback modified the milestones: Contributions Welcome, 0.24.0 Jan 6, 2019

jreback closed this as completed in #24654 Jan 6, 2019

jreback mentioned this issue Jan 10, 2019

CI: flaky hypothesis #24700

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CI: hypothesis occasionally failing #24641

CI: hypothesis occasionally failing #24641

jreback commented Jan 5, 2019

nixphix commented Jan 6, 2019

Zac-HD commented Jan 6, 2019

nixphix commented Jan 6, 2019

jreback commented Jan 6, 2019

Zac-HD commented Jan 6, 2019

nixphix commented Jan 6, 2019

jreback commented Jan 6, 2019

CI: hypothesis occasionally failing #24641

CI: hypothesis occasionally failing #24641

Comments

jreback commented Jan 5, 2019

nixphix commented Jan 6, 2019

Zac-HD commented Jan 6, 2019

nixphix commented Jan 6, 2019

jreback commented Jan 6, 2019

Zac-HD commented Jan 6, 2019

nixphix commented Jan 6, 2019

jreback commented Jan 6, 2019