Skip to content

Multiple issues in BackoffStrategy.calculateExponentialDelay(..) #1453

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
OlegZhukov opened this issue Oct 3, 2019 · 2 comments
Closed

Multiple issues in BackoffStrategy.calculateExponentialDelay(..) #1453

OlegZhukov opened this issue Oct 3, 2019 · 2 comments
Labels
bug This issue is a bug.

Comments

@OlegZhukov
Copy link

OlegZhukov commented Oct 3, 2019

There are several scenarios where the current (default) implementation of BackoffStrategy.calculateExponentialDelay(..) breaks. Specifically:

  1. Silent overflows may happen for 1L << retriesAttempted resulting in incorrect value, often negative, starting with 63 retry attempts (which is a lot, but not unthinkable and actually happened in our logic).
  2. Even if 1L << retriesAttempted part doesn't overflow, it then gets multiplied by baseDelay.toMillis() which, again, may cause an overflow with unpredictable result (small or large in absolute value, negative or positive, practically anything).
  3. baseDelay.toMillis() as well as maxBackoffTime.toMillis() may fail if the duration is too large for converting to milliseconds. There is no guard against that at the retry policy creation time.
  4. Even if the whole operation doesn't overflow as long it then gets converted to int which renders incorrect result (negative or positive, generally anything).

Apart from incorrect backoff delay this will menifest in unhandled exceptions duringDuration.toMillis() conversion for large durations and also during Random.nextInt(..) with non-positive bound later in jitter-enabled implementations.

Proposed fix

package software.amazon.awssdk.core.retry.backoff;

import com.google.common.annotations.VisibleForTesting;
import lombok.Builder;
import software.amazon.awssdk.core.retry.RetryPolicyContext;

import java.time.Duration;
import java.util.Random;

import static org.apache.commons.lang3.ObjectUtils.min;
import static software.amazon.awssdk.utils.Validate.isNotNegative;

public final class ProperEqualJitterBackoffStrategy implements BackoffStrategy {

    private static final Duration BASE_DELAY_CEILING = Duration.ofMillis(Integer.MAX_VALUE); // Around 24 days
    private static final Duration MAX_BACKOFF_CEILING = Duration.ofMillis(Integer.MAX_VALUE); // Around 24 days
    private static final int LOG2_OF_MAX_DURATION_DIVIDED_BY_MAX_BACKOFF_CEILING = (int) Math.floor(
            Math.log(Long.MAX_VALUE / BASE_DELAY_CEILING.getSeconds()) / Math.log(2));

    private final Duration baseDelay;
    private final Duration maxBackoffTime;
    private final Random random;

    @Builder
    private ProperEqualJitterBackoffStrategy(final Duration baseDelay, final Duration maxBackoffTime) {
        this(baseDelay, maxBackoffTime, new Random());
    }

    @VisibleForTesting
    ProperEqualJitterBackoffStrategy(final Duration baseDelay, final Duration maxBackoffTime, final Random random) {
        this.baseDelay = min(isNotNegative(baseDelay, "baseDelay"), BASE_DELAY_CEILING);
        this.maxBackoffTime = min(isNotNegative(maxBackoffTime, "maxBackoffTime"), MAX_BACKOFF_CEILING);
        this.random = random;
    }

    @Override
    public Duration computeDelayBeforeNextRetry(final RetryPolicyContext context) {
        final int ceil = exponentialDelayMillis(context.retriesAttempted(), baseDelay, maxBackoffTime);
        return Duration.ofMillis((ceil / 2) + random.nextInt((ceil / 2) + 1));
    }

    private static int exponentialDelayMillis(final int retriesAttempted,
            final Duration baseDelay, final Duration maxBackoffTime) {
        final int cappedRetries = Math.min(retriesAttempted, LOG2_OF_MAX_DURATION_DIVIDED_BY_MAX_BACKOFF_CEILING);
        return (int) min(baseDelay.multipliedBy(1L << cappedRetries), maxBackoffTime).toMillis();
    }
}

Unit test:

package software.amazon.awssdk.core.retry.backoff;

import org.junit.jupiter.api.Test;
import org.junit.jupiter.api.extension.ExtendWith;
import org.mockito.Mock;
import org.mockito.junit.jupiter.MockitoExtension;
import software.amazon.awssdk.core.retry.RetryPolicyContext;

import java.time.Duration;
import java.util.Random;

import static org.hamcrest.MatcherAssert.assertThat;
import static org.hamcrest.Matchers.is;
import static org.junit.jupiter.api.Assertions.assertThrows;
import static org.mockito.Mockito.when;

@ExtendWith(MockitoExtension.class)
class ProperEqualJitterBackoffStrategyTest {

    private static final Duration NEGATIVE_ONE_SECOND = Duration.ofSeconds(-1);
    private static final Duration MAX_DURATION = Duration.ofSeconds(Long.MAX_VALUE);
    private static final Duration FIVE_DAYS = Duration.ofDays(5);
    private static final Duration ONE_SECOND = Duration.ofSeconds(1);
    private static final Duration ONE_NANOSECOND = Duration.ofNanos(1);
    private static final int NANOS_IN_MILLISECOND = 1_000_000;
    private static final int RANDOM_RESULT = 12345;

    @Mock
    private Random mockRandom;

    @Test
    void GIVEN_exponentialDelayAboveCeiling_THEN_returnJitteredCeiling() {
        test(FIVE_DAYS, MAX_DURATION, 3, Integer.MAX_VALUE);
    }

    @Test
    void GIVEN_maxBaseDelay_THEN_returnJitteredCeiling() {
        test(MAX_DURATION, MAX_DURATION, 1, Integer.MAX_VALUE);
    }

    @Test
    void GIVEN_oneSecondMaxBackoff_THEN_returnJitteredOneSecond() {
        test(MAX_DURATION, ONE_SECOND, 1, (int) ONE_SECOND.toMillis());
    }

    @Test
    void GIVEN_exponentialDelayBelowCeiling_THEN_returnJitteredExponentialDelay() {
        test(ONE_SECOND, MAX_DURATION, 10, (1 << 10) * (int) ONE_SECOND.toMillis());
    }

    @Test
    void GIVEN_oneNanosecondMaxBackoff_THEN_returnJitteredZero() {
        test(MAX_DURATION, ONE_NANOSECOND, 10, 0);
    }

    @Test
    void GIVEN_oneNanosecondBaseDelay_THEN_returnCorrectJitteredExponentialDelay() {
        test(ONE_NANOSECOND, MAX_DURATION, 40, (int) ((1L << 40) / NANOS_IN_MILLISECOND));
    }

    @Test
    void GIVEN_tooManyRetries_THEN_returnCorrectResultForCappedRetries() {
        test(ONE_NANOSECOND, MAX_DURATION, 100, (int) ((1L << 41) / NANOS_IN_MILLISECOND));
    }

    @Test
    void GIVEN_tooManyRetriesWithMaxBaseDelay_THEN_returnJitteredCeiling() {
        test(MAX_DURATION, MAX_DURATION, 100, Integer.MAX_VALUE);
    }

    @Test
    void GIVEN_oneNanosecondExponentialDelay_THEN_returnJitteredZero() {
        test(ONE_NANOSECOND, MAX_DURATION, 0, 0);
    }

    @Test
    void GIVEN_negativeBaseDelay_THEN_throw() {
        assertThrows(IllegalArgumentException.class, () -> test(NEGATIVE_ONE_SECOND, MAX_DURATION, 1, 0));
    }

    @Test
    void GIVEN_negativeMaxBackoff_THEN_throw() {
        assertThrows(IllegalArgumentException.class, () -> test(ONE_SECOND, NEGATIVE_ONE_SECOND, 1, 0));
    }

    private void test(final Duration baseDelay, final Duration maxBackoffTime,
            final int retriesAttempted, final int expectedCeilingMillis) {
        final BackoffStrategy backoffStrategy = new ProperEqualJitterBackoffStrategy(
                baseDelay, maxBackoffTime, mockRandom);

        when(mockRandom.nextInt(expectedCeilingMillis / 2 + 1)).thenReturn(RANDOM_RESULT);

        assertThat(backoffStrategy.computeDelayBeforeNextRetry(toRetryContext(retriesAttempted)),
                is(Duration.ofMillis(expectedCeilingMillis / 2 + RANDOM_RESULT)));
    }

    private static RetryPolicyContext toRetryContext(final int retriesAttempted) {
        return RetryPolicyContext.builder().retriesAttempted(retriesAttempted).build();
    }
}
@zoewangg zoewangg added the investigating This issue is being investigated and/or work is in progress to resolve the issue. label Oct 3, 2019
@zoewangg
Copy link
Contributor

zoewangg commented Oct 3, 2019

Hi @OlegZhukov thank you for reporting the issue! Since you have already written the code to fix the issue, would you like to submit a PR?

@zoewangg zoewangg added bug This issue is a bug. and removed investigating This issue is being investigated and/or work is in progress to resolve the issue. labels Oct 3, 2019
@Quanzzzz Quanzzzz reopened this Mar 26, 2020
@Quanzzzz
Copy link
Contributor

Fixed by #1673 .

aws-sdk-java-automation added a commit that referenced this issue Jun 8, 2021
…852407a07

Pull request: release <- staging/074e5611-02e2-4b65-a039-92e852407a07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug This issue is a bug.
Projects
None yet
Development

No branches or pull requests

3 participants