Multiple issues in BackoffStrategy.calculateExponentialDelay(..) #1453

OlegZhukov · 2019-10-03T18:18:47Z

There are several scenarios where the current (default) implementation of BackoffStrategy.calculateExponentialDelay(..) breaks. Specifically:

Silent overflows may happen for 1L << retriesAttempted resulting in incorrect value, often negative, starting with 63 retry attempts (which is a lot, but not unthinkable and actually happened in our logic).
Even if 1L << retriesAttempted part doesn't overflow, it then gets multiplied by baseDelay.toMillis() which, again, may cause an overflow with unpredictable result (small or large in absolute value, negative or positive, practically anything).
baseDelay.toMillis() as well as maxBackoffTime.toMillis() may fail if the duration is too large for converting to milliseconds. There is no guard against that at the retry policy creation time.
Even if the whole operation doesn't overflow as long it then gets converted to int which renders incorrect result (negative or positive, generally anything).

Apart from incorrect backoff delay this will menifest in unhandled exceptions duringDuration.toMillis() conversion for large durations and also during Random.nextInt(..) with non-positive bound later in jitter-enabled implementations.

Proposed fix

package software.amazon.awssdk.core.retry.backoff;

import com.google.common.annotations.VisibleForTesting;
import lombok.Builder;
import software.amazon.awssdk.core.retry.RetryPolicyContext;

import java.time.Duration;
import java.util.Random;

import static org.apache.commons.lang3.ObjectUtils.min;
import static software.amazon.awssdk.utils.Validate.isNotNegative;

public final class ProperEqualJitterBackoffStrategy implements BackoffStrategy {

    private static final Duration BASE_DELAY_CEILING = Duration.ofMillis(Integer.MAX_VALUE); // Around 24 days
    private static final Duration MAX_BACKOFF_CEILING = Duration.ofMillis(Integer.MAX_VALUE); // Around 24 days
    private static final int LOG2_OF_MAX_DURATION_DIVIDED_BY_MAX_BACKOFF_CEILING = (int) Math.floor(
            Math.log(Long.MAX_VALUE / BASE_DELAY_CEILING.getSeconds()) / Math.log(2));

    private final Duration baseDelay;
    private final Duration maxBackoffTime;
    private final Random random;

    @Builder
    private ProperEqualJitterBackoffStrategy(final Duration baseDelay, final Duration maxBackoffTime) {
        this(baseDelay, maxBackoffTime, new Random());
    }

    @VisibleForTesting
    ProperEqualJitterBackoffStrategy(final Duration baseDelay, final Duration maxBackoffTime, final Random random) {
        this.baseDelay = min(isNotNegative(baseDelay, "baseDelay"), BASE_DELAY_CEILING);
        this.maxBackoffTime = min(isNotNegative(maxBackoffTime, "maxBackoffTime"), MAX_BACKOFF_CEILING);
        this.random = random;
    }

    @Override
    public Duration computeDelayBeforeNextRetry(final RetryPolicyContext context) {
        final int ceil = exponentialDelayMillis(context.retriesAttempted(), baseDelay, maxBackoffTime);
        return Duration.ofMillis((ceil / 2) + random.nextInt((ceil / 2) + 1));
    }

    private static int exponentialDelayMillis(final int retriesAttempted,
            final Duration baseDelay, final Duration maxBackoffTime) {
        final int cappedRetries = Math.min(retriesAttempted, LOG2_OF_MAX_DURATION_DIVIDED_BY_MAX_BACKOFF_CEILING);
        return (int) min(baseDelay.multipliedBy(1L << cappedRetries), maxBackoffTime).toMillis();
    }
}

Unit test:

package software.amazon.awssdk.core.retry.backoff;

import org.junit.jupiter.api.Test;
import org.junit.jupiter.api.extension.ExtendWith;
import org.mockito.Mock;
import org.mockito.junit.jupiter.MockitoExtension;
import software.amazon.awssdk.core.retry.RetryPolicyContext;

import java.time.Duration;
import java.util.Random;

import static org.hamcrest.MatcherAssert.assertThat;
import static org.hamcrest.Matchers.is;
import static org.junit.jupiter.api.Assertions.assertThrows;
import static org.mockito.Mockito.when;

@ExtendWith(MockitoExtension.class)
class ProperEqualJitterBackoffStrategyTest {

    private static final Duration NEGATIVE_ONE_SECOND = Duration.ofSeconds(-1);
    private static final Duration MAX_DURATION = Duration.ofSeconds(Long.MAX_VALUE);
    private static final Duration FIVE_DAYS = Duration.ofDays(5);
    private static final Duration ONE_SECOND = Duration.ofSeconds(1);
    private static final Duration ONE_NANOSECOND = Duration.ofNanos(1);
    private static final int NANOS_IN_MILLISECOND = 1_000_000;
    private static final int RANDOM_RESULT = 12345;

    @Mock
    private Random mockRandom;

    @Test
    void GIVEN_exponentialDelayAboveCeiling_THEN_returnJitteredCeiling() {
        test(FIVE_DAYS, MAX_DURATION, 3, Integer.MAX_VALUE);
    }

    @Test
    void GIVEN_maxBaseDelay_THEN_returnJitteredCeiling() {
        test(MAX_DURATION, MAX_DURATION, 1, Integer.MAX_VALUE);
    }

    @Test
    void GIVEN_oneSecondMaxBackoff_THEN_returnJitteredOneSecond() {
        test(MAX_DURATION, ONE_SECOND, 1, (int) ONE_SECOND.toMillis());
    }

    @Test
    void GIVEN_exponentialDelayBelowCeiling_THEN_returnJitteredExponentialDelay() {
        test(ONE_SECOND, MAX_DURATION, 10, (1 << 10) * (int) ONE_SECOND.toMillis());
    }

    @Test
    void GIVEN_oneNanosecondMaxBackoff_THEN_returnJitteredZero() {
        test(MAX_DURATION, ONE_NANOSECOND, 10, 0);
    }

    @Test
    void GIVEN_oneNanosecondBaseDelay_THEN_returnCorrectJitteredExponentialDelay() {
        test(ONE_NANOSECOND, MAX_DURATION, 40, (int) ((1L << 40) / NANOS_IN_MILLISECOND));
    }

    @Test
    void GIVEN_tooManyRetries_THEN_returnCorrectResultForCappedRetries() {
        test(ONE_NANOSECOND, MAX_DURATION, 100, (int) ((1L << 41) / NANOS_IN_MILLISECOND));
    }

    @Test
    void GIVEN_tooManyRetriesWithMaxBaseDelay_THEN_returnJitteredCeiling() {
        test(MAX_DURATION, MAX_DURATION, 100, Integer.MAX_VALUE);
    }

    @Test
    void GIVEN_oneNanosecondExponentialDelay_THEN_returnJitteredZero() {
        test(ONE_NANOSECOND, MAX_DURATION, 0, 0);
    }

    @Test
    void GIVEN_negativeBaseDelay_THEN_throw() {
        assertThrows(IllegalArgumentException.class, () -> test(NEGATIVE_ONE_SECOND, MAX_DURATION, 1, 0));
    }

    @Test
    void GIVEN_negativeMaxBackoff_THEN_throw() {
        assertThrows(IllegalArgumentException.class, () -> test(ONE_SECOND, NEGATIVE_ONE_SECOND, 1, 0));
    }

    private void test(final Duration baseDelay, final Duration maxBackoffTime,
            final int retriesAttempted, final int expectedCeilingMillis) {
        final BackoffStrategy backoffStrategy = new ProperEqualJitterBackoffStrategy(
                baseDelay, maxBackoffTime, mockRandom);

        when(mockRandom.nextInt(expectedCeilingMillis / 2 + 1)).thenReturn(RANDOM_RESULT);

        assertThat(backoffStrategy.computeDelayBeforeNextRetry(toRetryContext(retriesAttempted)),
                is(Duration.ofMillis(expectedCeilingMillis / 2 + RANDOM_RESULT)));
    }

    private static RetryPolicyContext toRetryContext(final int retriesAttempted) {
        return RetryPolicyContext.builder().retriesAttempted(retriesAttempted).build();
    }
}

The text was updated successfully, but these errors were encountered:

zoewangg · 2019-10-03T23:46:32Z

Hi @OlegZhukov thank you for reporting the issue! Since you have already written the code to fix the issue, would you like to submit a PR?

Quanzzzz · 2020-03-26T22:59:39Z

Fixed by #1673 .

…852407a07 Pull request: release <- staging/074e5611-02e2-4b65-a039-92e852407a07

zoewangg added the investigating This issue is being investigated and/or work is in progress to resolve the issue. label Oct 3, 2019

zoewangg added bug This issue is a bug. and removed investigating This issue is being investigated and/or work is in progress to resolve the issue. labels Oct 3, 2019

This was referenced Feb 26, 2020

Fixed overflow bugs in backoff strategy #1668

Closed

Fix overflow bugs in backoff strategy. #1673

Merged

Quanzzzz closed this as completed Mar 26, 2020

Quanzzzz reopened this Mar 26, 2020

Quanzzzz closed this as completed Mar 26, 2020

aws-sdk-java-automation added a commit that referenced this issue Jun 8, 2021

Merge pull request #1453 from aws/staging/074e5611-02e2-4b65-a039-92e…

d12f7bf

…852407a07 Pull request: release <- staging/074e5611-02e2-4b65-a039-92e852407a07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Multiple issues in BackoffStrategy.calculateExponentialDelay(..) #1453

Multiple issues in BackoffStrategy.calculateExponentialDelay(..) #1453

OlegZhukov commented Oct 3, 2019 •

edited

Loading

zoewangg commented Oct 3, 2019 •

edited

Loading

Uh oh!

Quanzzzz commented Mar 26, 2020

Uh oh!

Multiple issues in BackoffStrategy.calculateExponentialDelay(..) #1453

Multiple issues in BackoffStrategy.calculateExponentialDelay(..) #1453

Comments

OlegZhukov commented Oct 3, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Proposed fix

zoewangg commented Oct 3, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Quanzzzz commented Mar 26, 2020

Uh oh!

OlegZhukov commented Oct 3, 2019 •

edited

Loading

zoewangg commented Oct 3, 2019 •

edited

Loading