diff --git a/docs/design/core/metrics/Design.md b/docs/design/core/metrics/Design.md new file mode 100644 index 000000000000..6220f121d8c6 --- /dev/null +++ b/docs/design/core/metrics/Design.md @@ -0,0 +1,280 @@ +## Concepts +### Metric +* A representation of data collected +* Metric can be one of the following types: Counter, Gauge, Timer +* Metric can be associated to a category. Some of the metric categories are Default, HttpClient, Streaming etc + +### MetricRegistry + +* A MetricRegistry represent an interface to store the collected metric data. It can hold different types of Metrics + described above +* MetricRegistry is generic and not tied to specific category (ApiCall, HttpClient etc) of metrics. +* Each API call has it own instance of a MetricRegistry. All metrics collected in the ApiCall lifecycle are stored in + that instance. +* A MetricRegistry can store other instances of same type. This can be used to store metrics for each Attempt in an Api + Call. +* [Interface prototype](prototype/MetricRegistry.java) + +### MetricPublisher + +* A MetricPublisher represent an interface to publish the collected metrics to a external source. +* SDK provides implementations to publish metrics to services like [Amazon + CloudWatch](https://aws.amazon.com/cloudwatch/), [Client Side + Monitoring](https://docs.aws.amazon.com/sdk-for-java/v1/developer-guide/sdk-metrics.html) (also known as AWS SDK + Metrics for Enterprise Support) +* Customers can implement the interface and register the custom implementation to publish metrics to a platform not + supported in the SDK. +* MetricPublishers can have different behaviors in terms of list of metrics to publish, publishing frequency, + configuration needed to publish etc. +* Metrics can be explicitly published to the platform by calling publish() method. This can be useful in scenarios when + the application fails and customer wants to flush metrics before exiting the application. +* [Interface prototype](prototype/MetricPublisher.java) + +### Reporting + +* Reporting is transferring the collected metrics to Publishers. +* To report metrics to a publisher, call the registerMetrics(MetricRegistry) method on the MetricPublisher. +* There is no requirement for Publisher to publish the reported metrics immediately after calling this method. + + +## Enabling Metrics + +Metrics feature is disabled by default. Metrics can be enabled at client level in the following ways. + +### Feature Flags (Metrics Provider) + +* SDK exposes an [interface](prototype/MetricConfigurationProvider.java) to enable the metrics feature and specify + options to configure the metrics behavior. +* SDK provides an implementation of this interface based on system properties. +* Here are the system properties SDK supports: + - **aws.javasdk2x.metrics.enabled** - Metrics feature is enabled if this system property is set + - **aws.javasdk2x.metrics.category** - Comma separated set of MetricCategory that are enabled for collection +* SDK calls the methods in this interface for each request ie, enabled() method is called for every request to determine + if the metrics feature is enabled or not (similarly for other configuration options). + - This allows customers to control metrics behavior in a more flexible manner; for example using an external database + like DynamoDB to dynamically control metrics collection. This is useful to enable/disable metrics feature and + control metrics options at runtime without the need to make code changes or re-deploy the application. +* As the interface methods are called for each request, it is recommended for the implementations to run expensive tasks + asynchronously in the background, cache the results and periodically refresh the results. + +```java +ClientOverrideConfiguration config = ClientOverrideConfiguration + .builder() + // If this is not set, SDK uses the default chain with system property + .metricConfigurationProvider(new SystemSettingsMetricConfigurationProvider()) + .build(); + +// Set the ClientOverrideConfiguration instance on the client builder +CodePipelineAsyncClient asyncClient = + CodePipelineAsyncClient + .builder() + .overrideConfiguration(config) + .build(); +``` + +### Metrics Provider Chain + +* Customers might want to have different ways of enabling the metrics feature. For example: use SystemProperties by + default. If not use implementation based on Amazon DynamoDB. +* To support multiple providers, SDK allows setting chain of providers (similar to the CredentialsProviderChain to + resolve credentials). As provider has multiple configuration options, a single provider is resolved at chain + construction time and it is used throughout the lifecycle of the application to keep the behavior intuitive. +* If no custom chain is provided, SDK will use a default chain while looks for the System properties defined in above + section. SDK can add more providers in the default chain in the future without breaking customers. + +```java +MetricConfigurationProvider chain = new MetricConfigurationProviderChain( + new SystemSettingsMetricConfigurationProvider(), + // example custom implementation (not provided by the SDK) + DynamoDBMetricConfigurationProvider.builder() + .tableName(TABLE_NAME) + .enabledKey(ENABLE_KEY_NAME) + ... + .build(), + ); + +ClientOverrideConfiguration config = ClientOverrideConfiguration + .builder() + // If this is not set, SDK uses the default chain with system property + .metricConfigurationProvider(chain) + .build(); + +// Set the ClientOverrideConfiguration instance on the client builder +CodePipelineAsyncClient asyncClient = + CodePipelineAsyncClient + .builder() + .overrideConfiguration(config) + .build(); +``` + +### Metric Publishers Configuration + +* If metrics are enabled, SDK by default uses a single publisher that uploads metrics to CloudWatch using default + credentials and region. +* Customers might want to use different configuration for the CloudWatch publisher or even use a different publisher to + publish to a different source. To provide this flexibility, SDK exposes an option to set + [MetricPublisherConfiguration](prototype/MetricPublisherConfiguration.java) which can be used to configure custom + publishers. +* SDK publishes the collected metrics to each of the configured publishers in the MetricPublisherConfiguration. + +```java +ClientOverrideConfiguration config = ClientOverrideConfiguration + .builder() + .metricPublisherConfiguration(MetricPublisherConfiguration + .builder() + .addPublisher( + CloudWatchPublisher.builder() + .credentialsProvider(...) + .region(Region.AP_SOUTH_1) + .publishFrequency(5, TimeUnit.MINUTES) + .build(), + CsmPublisher.create()).bu + .build()) + .build(); + +// Set the ClientOverrideConfiguration instance on the client builder +CodePipelineAsyncClient asyncClient = + CodePipelineAsyncClient + .builder() + .overrideConfiguration(config) + .build(); +``` + + +## Modules +New modules are created to support metrics feature. + +### metrics-spi +* Contains the metrics interfaces and default implementations that don't require other dependencies +* This is a sub module under `core` +* `sdk-core` has a dependency on `metrics-spi`, so customers will automatically get a dependency on this module. + +### metrics-publishers +* This is a new module that contains implementations of all SDK supported publishers +* Under this module, a new sub-module is created for each publisher (`cloudwatch-publisher`, `csm-publisher`) +* Customers has to **explicitly add dependency** on these modules to use the sdk provided publishers + + +## Sequence Diagram + +Metrics Collection + +
+ +![Metrics Collection](images/MetricCollection.jpg) + +
+ +MetricPublisher + +
+ +![MetricPublisher fig.align="left"](images/MetricPublisher.jpg) + +
+ +1. Client enables metrics feature through MetricConfigurationProvider and configure publishers through + MetricPublisherConfiguration. +2. For each API call, a new MetricRegistry object is created and stored in the ExecutionAttributes. If metrics are not + enabled, a NoOpMetricRegistry is used. +3. At each metric collection point, the metric is registered in the MetricRegistry object if its category is enabled in + MetricConfigurationProvider. +4. The metrics that are collected once for a Api Call execution are stored in the METRIC_REGISTRY ExecutionAttribute. +5. The metrics that are collected per Api Call attempt are stored in new MetricRegistry instances which are part of the + ApiCall MetricRegistry. These MetricRegistry instance for the current attempt is also accessed through + ATTEMPT_METRIC_REGISTRY ExecutionAttribute. +6. At end of API call, report the MetricRegistry object to MetricPublishers by calling registerMetrics(MetricRegistry) + method. This is done in an ExecutionInterceptor. +7. Steps 2 to 6 are repeated for each API call +8. MetricPublisher calls publish() method to report metrics to external sources. The frequency of publish() method call + is unique to Publisher implementation. +9. Client has access to all registered publishers and it can call publish() method explicitly if desired. + + +CloudWatch MetricPublisher + +
+ +![CloudWatch MetricPublisher](images/CWMetricPublisher.jpg) + +
+ +## Implementation Details +Few important implementation details are discussed in this section. + +SDK modules can be organized as shown in this image. + +
+ +![Module Hierarchy](images/MetricsModulesHierarchy.png) + +
+ +* Core modules - Modules in the core directory while have access to ExecutionContext and ExecutionAttributes +* Downstream modules - Modules where execution occurs after core modules. For example, http-clients is downstream module + as the request is transferred from core to http client for further execution. +* Upstream modules - Modules that live in layers above core. Examples are High Level libraries (HLL) or Applications + that use SDK. Execution goes from Upstream modules to core modules. + +### Core Modules +* SDK will use ExecutionAttributes to pass the MetricConfigurationProvider information through out the core module where + core request-response metrics are collected. +* Instead of checking whether metrics is enabled at each metric collection point, SDK will use the instance of + NoOpMetricRegistry (if metrics are disabled) and DefaultMetricRegistry (if metrics are enabled). +* The NoOpMetricRegistry class does not collect or store any metric data. Instead of creating a new NoOpMetricRegistry + instance for each request, use the same instance for every request to avoid additional object creation. +* The DefaultMetricRegistry class will only collect metrics if they belong to the MetricCategory list provided in the + MetricConfigurationProvider. To support this, DefaultMetricRegistry is decorated by another class to filter metric + categories that are not set in MetricConfigurationProvider. + +### Downstream Modules +* The MetricRegistry object and other required metric configuration details will be passed to the classes in downstream + modules. +* For example, HttpExecuteRequest for sync http client, AsyncExecuteRequest for async http client. +* Downstream modules record the metric data directly into the given MetricRegistry object. +* As we use same MetricRegistry object for core and downstream modules, both metrics will be reported to the Publisher + together. + +### Upstream Modules +* As MetricRegistry object is created after the execution is passed from Upstream modules, these modules won't be able + to modify/add to the core metrics. +* If upstream modules want to report additional metrics using the registered publishers, they would need to create + MetricRegistry instances and explicitly call the methods on the Publishers. +* It would be useful to get the low-level API metrics in these modules, so SDK will expose APIs to get an immutable + version of the MetricRegistry object so that upstream classes can use that information in their metric calculation. + +### Reporting +* Collected metrics are reported to the configured publishers at the end of each Api Call by calling + `registerMetrics(MetricRegistry)` method on MetricPublisher. +* The MetricRegistry argument in the registerMetrics method will have data on the entire Api Call including retries. +* This reporting is done in `MetricsExecutionInterceptor` via `afterExecution()` and `onExecutionFailure()` methods. +* `MetricsExecutionInterceptor` will always be the last configured ExecutionInterceptor in the interceptor chain + + +## Performance +One of the main tenet for metrics is “Enabling default metrics should have minimal impact on the application +performance". The following design choices are made to ensure enabling metrics does not effect performance +significantly. +* When collecting metrics, a NoOpRegistry is used if metrics are disabled. All methods in this registry are no-op and + return immediately. This also has the additional benefit of avoid metricsEnabled check at each metric collection + point. +* Metric publisher implementations can involve network calls and impact latency if done in blocking way. So all SDK + publisher implementation will process the metrics asynchronously and does not block the actual request. + + +## Testing + +To ensure performance is not impacted due to metrics, tests should be written with various scenarios and a baseline for +overhead should be created. These tests should be run regularly to catch regressions. + +### Test Cases + +SDK will be tested under load for each of these test cases using the load testing framework we already have. Each of +these test case results should be compared with metrics feature disabled & enabled, and then comparing the results. + +1. Enable each metrics publisher (CloudWatch, CSM) individually. +2. Enable all metrics publishers. +3. Individually enable each metric category to find overhead for each MetricCategory. + + + diff --git a/docs/design/core/metrics/MetricsList.md b/docs/design/core/metrics/MetricsList.md new file mode 100644 index 000000000000..f93c912be42a --- /dev/null +++ b/docs/design/core/metrics/MetricsList.md @@ -0,0 +1,130 @@ +Here is the detailed list of metrics that SDK can collect. Each metric belongs to a category. If a category is enabled, +then all metrics belonging to that category will be collected by the SDK. + +## Category + +1) Default - All metrics under this category will be collected when the metrics are enabled +2) HttpClient - Additional information collected for http client. The metrics collected for each http client can vary +3) All - All metrics collected by the SDK comes under this category. This can be useful for debugging purposes. + +Note: When metrics feature is enabled, only the `Default` category metrics are collected. Other categories should be +explicitly enabled. + +## Information collected at application level (Category: Default) + +| Metric Name | Meter | Description | +| ------------------ | ----------- | ---------------- | +| RequestCount | Counter | Total number of requests (successful and failed) made from your code to AWS services +| SuccessRequestCount | Counter | Total number of requests from your code to AWS services that resulted in a successful response +| FailedRequestCount | Counter | Total number of requests from your code to AWS services that resulted in a failure. This can be expanded later to categorize the failures into buckets (like ClientErrorCount, ServiceErrorCount, ConnectionErrorCount etc) + +## Information collected for each request (ApiCall) (Category: Default) + +| Metric Name | Meter | Description | +| ------------------ | ----------- | ---------------- | +| Service | ConstantGauge | Service ID of the AWS service that the API request is made against +| Api | ConstantGauge | The name of the AWS API the request is made to +| StreamingRequest | ConstantGauge | True if the request has streaming payload +| StreamingResponse | ConstantGauge | True if the response has streaming payload +| ApiCallStartTime | Timer | The start time of the request +| ApiCallEndTime | Timer | The end time of the request +| ApiCallLatency | Timer | The total time taken to finish a request (inclusive of all retries), ApiCallEndTime - ApiCallStartTime +| MarshallingLatency | Timer | The time taken to marshall the request +| ApiCallAttemptCount | Counter | Total number of attempts that were made by the service client to fulfill this request before succeeding or failing. (Value is 1 if there are no retries) + +Each ApiCall can have multiple attempts before the call succeed or fail. The following metrics are collected for each ApiCall Attempt. + +| Metric Name | Meter | Description | +| ------------------ | ----------- | ---------------- | +| ApiCallAttemptStartTime | Timer | The start time of each Api call attempt +| SigningLatency | Timer | The time taken to sign the request in an Api Call Attempt +| HttpRequestRoundTripLatency | Timer | The time taken by the underlying http client to start the Api call attempt and return the response +| UnmarshallingLatency | Timer | The time taken to unmarshall the response (same metric for both successful and failed requests) +| ApiCallAttemptEndTime | Timer | The end time of a Api call attempt +| ApiCallAttemptLatency | Timer | The total time taken for an Api call attempt (exclusive of retries), ApiCallAttemptEndTime - ApiCallAttemptStartTime +| AwsRequestId | ConstantGauge | The request Id for the request. Represented by `x-amz-request-id` header in response +| ExtendedRequestId | ConstantGauge | The extended request Id for the request. Represented by `x-amz-id-2` header in response +| HttpStatusCode | ConstantGauge | The http status code returned in the response. Null if there is no response +| AwsException | ConstantGauge | The Aws exception code returned by the service. This is included for each Api call attempt if the call results in a failure and caused by service +| SdkException | ConstantGauge | The error name for any failure that is due to something other than an Aws exception. This is included for each API call attempt if the call results in a failure and is caused by something other than service + +For each attempt, the following http client metrics are collected: + +| Metric Name | Meter | Description | +| ------------------ | ----------- | ---------------- | +| HttpClientName | ConstantGauge | Name of the underlying http client (Apache, Netty, UrlConnection) +| MaxConnections | Gauge | Maximum number of connections allowed in the connection pool +| AvailableConnections | Gauge | The number of idle connections in the connection pool that are ready to serve a request +| LeasedConnections | Gauge | The number of connections in the connection pool that are busy serving requests +| PendingRequests | Gauge | The number of requests awaiting a free connection from the pool + +## Additional Information collected for each http client (Category: HttpClient) + +### ApacheHttpClient +HttpClientName - Apache + +No additional metrics available for apache client currently + +### UrlConnectionHttpClient +HttpClientName - UrlConnection + +No additional metrics available for url connection client currently + +### NettyNioAsyncHttpClient +HttpClientName - Netty + +| Metric Name | Meter | Description | +| ------------------ | ----------- | ---------------- | +| FailedConnectionClose | Counter | Number of times a connection close has failed +| FailedPoolAcquire | Counter | Number of times a request failed to acquire a connection + +For Http2 requests, + +| Metric Name | Meter | Description | +| ------------------ | ----------- | ---------------- | +| ConnectionId | ConstantGauge | The identifier for a connection +| MaxStreamCount | Gauge | Maximum number of streams allowed on the connection +| CurrentStreamCount | Gauge | Number of active streams on the connection + + +## Information collected for event stream requests (Category: Default) + +| Metric Name | Meter | Description | +| ------------------ | ----------- | ---------------- | +| RequestEventsReceivedCount | Counter | Number of events received from the client +| RequestEventsSentCount | Counter | Number of events sent to the service +| ResponseEventsReceivedCount | Counter | Number of events received from the service +| ResponseEventsDeliveredCount | Counter | Number of events delivered to the client +| RequestSubscriptionCreated | Counter | Number of request subscriptions created to deliver events from client to service (For event stream requests like startStreamTranscription API in Transcribe Streaming service) +| RequestSubscriptionCompleted | Counter | Number of request subscriptions completed +| RequestSubscriptionCanceled | Counter | Number of request subscriptions canceled +| ResponseSubscriptionCreated | Counter | Number of response subscriptions created to deliver events from service to client +| ResponseSubscriptionCompleted | Counter | Number of response subscriptions completed +| ResponseSubscriptionCanceled | Counter | Number of response subscriptions canceled + + +## FAQ +1) When is the end time calculated for async requests? + The end time is calculated when the future is completed (either successfully or exceptionally) as opposed to the time when future is returned from API + +2) What errors are considered as throttling errors? + The request was considered as throttled if one of the following conditions are met: + 1) The http status code is equal to: `429` or `503` + 2) The error code is equal to one of the following values: + SlowDown + SlowDownException + Throttling + ThrottlingException + Throttled + ThrottledException + ServiceUnavailable + ServiceUnavailableException + ServiceUnavailableError + ProvisionedThroughputExceededException + TooManyRequests + TooManyRequestsException + DescribeAttachmentLimitExceeded + + +## References +1) [V1 Metrics Description](https://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/metrics/package-summary.html) diff --git a/docs/design/core/metrics/README.md b/docs/design/core/metrics/README.md new file mode 100644 index 000000000000..3f94783c8040 --- /dev/null +++ b/docs/design/core/metrics/README.md @@ -0,0 +1,69 @@ +**Design:** New Feature, **Status:** +[In Development](../../../README.md) + +# Project Tenets (unless you know better ones) + +1. Metrics can be used to provide insights about application behavior to enhance performance and debug operational + issues. +2. Enabling default metrics should have minimal impact on the application performance. +3. Customers can publish the collected metrics to their choice of platform. +4. Metrics are divided into different categories for granular control. +5. Customers can control the cost by having the ability to enable/disable the metrics collection by category. +6. Metrics collected by SDK are namespaced to avoid collision with other application metrics. + + +# Project Introduction + +This project adds a feature to the AWS SDK for Java that can collect and report client side SDK metrics in your +application. Metrics helps developers, ops engineers to detect and diagnose issues in their applications. The metrics +can also be used to gather insights into the application over time and tune the application for optimal performance. + + +# Project Details + +1. Metrics are disabled by default and should be enabled explicitly by customers. Enabling metrics will introduce small + overhead. +2. Metrics can be enabled quickly during large scale events with need for code change or deployments. +3. Customers may publish metrics using their existing credentials. +4. Metrics are stored and accessed by AWS only with explicit permissions from the customer. +5. New Metrics can be added and published by the SDK into existing categories. + + +# Metrics Meters +Meters define the way a metric is measured. Here are the list of meters: + +**Counter :** Number of times a metric is reported. These kind of metrics can be incremented or decremented. +For example: number of requests made since the start of application + +**Timer :** Records the time between start of an event and end of an event. An example is the time taken (latency) to +complete a request. + +**Gauge :** A value recorded at a point in time. An example is the number of connections in the client pool. + +**Constant Gauge :** There are metrics that have a static value which doesn't change after it is set. Some examples are +service name, API name, status code, request id. To support this, a constant implementation of gauge is used + +Reference: Some Meter names are taken from open source +[spectator](http://netflix.github.io/spectator/en/latest/intro/counter/) project (Apache 2.0 license). + +# Naming + +1. Metric names should be in CamelCase format. +2. Only Alphabets and numbers are allowed in metric names. + +## Collected Metrics + +The full list of metrics collected by the SDK are documented [here](MetricsList.md) along with their definitions. + + +# Metric Publishers + +Metric Publishers are the implementations that are used to publish metrics to different platforms. +SDK provides default publishers to publish to following platforms for convenience. +Customers can implement custom publishers to publish metrics to platforms not supported by SDK. + +## Supported platforms +1) CloudWatch + +2) CSM - Client Side Monitoring (also known as [AWS SDK Metrics for Enterprise +Support](https://docs.aws.amazon.com/sdk-for-java/v1/developer-guide/sdk-metrics.html)) diff --git a/docs/design/core/metrics/images/CWMetricPublisher.jpg b/docs/design/core/metrics/images/CWMetricPublisher.jpg new file mode 100644 index 000000000000..f18bfecf1db7 Binary files /dev/null and b/docs/design/core/metrics/images/CWMetricPublisher.jpg differ diff --git a/docs/design/core/metrics/images/MetricCollection.jpg b/docs/design/core/metrics/images/MetricCollection.jpg new file mode 100644 index 000000000000..46b72012ff89 Binary files /dev/null and b/docs/design/core/metrics/images/MetricCollection.jpg differ diff --git a/docs/design/core/metrics/images/MetricPublisher.jpg b/docs/design/core/metrics/images/MetricPublisher.jpg new file mode 100644 index 000000000000..2b605d1b5388 Binary files /dev/null and b/docs/design/core/metrics/images/MetricPublisher.jpg differ diff --git a/docs/design/core/metrics/images/MetricsModulesHierarchy.png b/docs/design/core/metrics/images/MetricsModulesHierarchy.png new file mode 100644 index 000000000000..cb5afcf7a90e Binary files /dev/null and b/docs/design/core/metrics/images/MetricsModulesHierarchy.png differ diff --git a/docs/design/core/metrics/prototype/MetricConfigurationProvider.java b/docs/design/core/metrics/prototype/MetricConfigurationProvider.java new file mode 100644 index 000000000000..26da559af419 --- /dev/null +++ b/docs/design/core/metrics/prototype/MetricConfigurationProvider.java @@ -0,0 +1,45 @@ +/* + * Copyright 2010-2019 Amazon.com, Inc. or its affiliates. All Rights Reserved. + * + * Licensed under the Apache License, Version 2.0 (the "License"). + * You may not use this file except in compliance with the License. + * A copy of the License is located at + * + * http://aws.amazon.com/apache2.0 + * + * or in the "license" file accompanying this file. This file is distributed + * on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either + * express or implied. See the License for the specific language governing + * permissions and limitations under the License. + */ + +package software.amazon.awssdk.metrics.provider; + +import java.util.Set; +import software.amazon.awssdk.annotations.SdkPublicApi; +import software.amazon.awssdk.metrics.MetricCategory; + +/** + * Interface to configure the options in metrics feature. + * + * This interface acts as a feature flag for metrics. The methods in the interface are called for each request. + * This gives flexibility for metrics feature to be enabled/disabled at runtime and configuration changes + * can be picked up at runtime without need for deploying the application (depending on the implementation). + * + * @see SystemSettingsMetricConfigurationProvider + */ +@SdkPublicApi +public interface MetricConfigurationProvider { + + /** + * @return true if the metrics feature is enabled. + * false if the feature is disabled. + */ + boolean enabled(); + + /** + * Return the set of {@link MetricCategory} that are enabled for metrics collection. + * Only metrics belonging to these categories will be collected. + */ + Set metricCategories(); +} diff --git a/docs/design/core/metrics/prototype/MetricPublisher.java b/docs/design/core/metrics/prototype/MetricPublisher.java new file mode 100644 index 000000000000..cb6f3c66ce25 --- /dev/null +++ b/docs/design/core/metrics/prototype/MetricPublisher.java @@ -0,0 +1,68 @@ +/* + * Copyright 2010-2019 Amazon.com, Inc. or its affiliates. All Rights Reserved. + * + * Licensed under the Apache License, Version 2.0 (the "License"). + * You may not use this file except in compliance with the License. + * A copy of the License is located at + * + * http://aws.amazon.com/apache2.0 + * + * or in the "license" file accompanying this file. This file is distributed + * on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either + * express or implied. See the License for the specific language governing + * permissions and limitations under the License. + */ + +package software.amazon.awssdk.metrics.publisher; + +import java.util.concurrent.CompletableFuture; +import software.amazon.awssdk.annotations.SdkPublicApi; +import software.amazon.awssdk.metrics.registry.MetricRegistry; + +/** + * Interface to report and publish the collected SDK metrics to external sources. + * + * Publisher implementations create and maintain resources (like clients, thread pool etc) that are used for publishing. + * They should be closed in the close() method to avoid resource leakage. + * + *

+ * As metrics are not part of the business logic, failures caused by metrics features should not fail the application. + * So SDK publisher implementations suppress all errors during the metrics publishing and log them. + *

+ * + *

+ * In certain situations (high throttling errors, metrics are reported faster than publishing etc), storing all the metrics + * might take up lot of memory and can crash the application. In these cases, it is recommended to have a max limit on + * number of metrics stored or memory used for metrics and drop the metrics when the limit is breached. + *

+ */ +@SdkPublicApi +public interface MetricPublisher extends AutoCloseable { + + /** + * Registers the metric information supplied in MetricsRegistry. The reported metrics can be transformed and + * stored in a format the publisher uses to publish the metrics. + * + * This method is called at the end of each request execution to report all the metrics collected + * for that request (including retry attempt metrics) + */ + void registerMetrics(MetricRegistry metricsRegistry); + + /** + * Publish all metrics stored in the publisher. If all available metrics cannot be published in a single call, + * multiple calls will be made to publish the metrics. + * + * It is recommended to publish the metrics in a non-blocking way. As it is common to publish metrics to an external + * source which involves network calls, the method is intended to be implemented in a non-blocking way and thus + * returns a {@link CompletableFuture}. + * + * Depending on the implementation, the metrics are published to the external source periodically like: + * a) after a certain time period + * b) after n metrics are registered + * c) after the buffer is full + * + * Implementations can also call publish method for every reported metric. But this can be expensive and + * is not recommended. + */ + CompletableFuture publish(); +} diff --git a/docs/design/core/metrics/prototype/MetricPublisherConfiguration.java b/docs/design/core/metrics/prototype/MetricPublisherConfiguration.java new file mode 100644 index 000000000000..d4f88b4b15da --- /dev/null +++ b/docs/design/core/metrics/prototype/MetricPublisherConfiguration.java @@ -0,0 +1,92 @@ +/* + * Copyright 2010-2019 Amazon.com, Inc. or its affiliates. All Rights Reserved. + * + * Licensed under the Apache License, Version 2.0 (the "License"). + * You may not use this file except in compliance with the License. + * A copy of the License is located at + * + * http://aws.amazon.com/apache2.0 + * + * or in the "license" file accompanying this file. This file is distributed + * on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either + * express or implied. See the License for the specific language governing + * permissions and limitations under the License. + */ + +package software.amazon.awssdk.metrics.publisher; + +import java.util.Collections; +import java.util.List; +import software.amazon.awssdk.utils.builder.CopyableBuilder; +import software.amazon.awssdk.utils.builder.ToCopyableBuilder; + +/** + * Configure the options to publish the metrics. + *

+ * By default, SDK creates and uses only CloudWatch publisher with default options (Default credential chain + * and region chain). + * To use CloudWatch publisher with custom options or any other publishers, create a + * #PublisherConfiguration object and set it in the ClientOverrideConfiguration on the client. + *

+ * + *

+ * SDK exposes the CloudWatch and CSM publisher implementation, so instances of these classes with + * different configuration can be set in this class. + *

+ */ +public final class MetricPublisherConfiguration implements + ToCopyableBuilder { + + private final List publishers = Collections.emptyList(); + + public MetricPublisherConfiguration(Builder builder) { + this.publishers.addAll(builder.publishers); + } + + /** + * @return the list of #MetricPublisher to be used for publishing the metrics + */ + public List publishers() { + return publishers; + } + + /** + * @return a {@link Builder} object to construct a PublisherConfiguration instance. + */ + public static Builder builder() { + return new Builder(); + } + + @Override + public Builder toBuilder() { + return new Builder(); + } + + public static final class Builder implements CopyableBuilder { + + private final List publishers = Collections.emptyList(); + + private Builder() { + } + + /** + * Sets the list of publishers used for publishing the metrics. + */ + public Builder publishers(List publishers) { + this.publishers.addAll(publishers); + return this; + } + + /** + * Add a publisher to the list of publishers used for publishing the metrics. + */ + public Builder addPublisher(MetricPublisher publisher) { + this.publishers.add(publisher); + return this; + } + + public MetricPublisherConfiguration build() { + return new MetricPublisherConfiguration(this); + } + } +} diff --git a/docs/design/core/metrics/prototype/MetricRegistry.java b/docs/design/core/metrics/prototype/MetricRegistry.java new file mode 100644 index 000000000000..939d35d5da63 --- /dev/null +++ b/docs/design/core/metrics/prototype/MetricRegistry.java @@ -0,0 +1,127 @@ +/* + * Copyright 2010-2019 Amazon.com, Inc. or its affiliates. All Rights Reserved. + * + * Licensed under the Apache License, Version 2.0 (the "License"). + * You may not use this file except in compliance with the License. + * A copy of the License is located at + * + * http://aws.amazon.com/apache2.0 + * + * or in the "license" file accompanying this file. This file is distributed + * on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either + * express or implied. See the License for the specific language governing + * permissions and limitations under the License. + */ + +package software.amazon.awssdk.metrics.registry; + +import java.util.List; +import java.util.Map; +import java.util.Optional; +import software.amazon.awssdk.annotations.SdkPublicApi; +import software.amazon.awssdk.metrics.meter.Counter; +import software.amazon.awssdk.metrics.meter.Gauge; +import software.amazon.awssdk.metrics.meter.Metric; +import software.amazon.awssdk.metrics.meter.Timer; + +/** + * Registry to store the collected metrics data. The interface can be used to store metrics for ApiCall and ApiCallAttempt. + * For a ApiCall, there can be multiple attempts and so a MetricRegistry has the option to store other MetricRegistry instances. + */ +@SdkPublicApi +public interface MetricRegistry { + + /** + * Return the ApiCall level metrics registered in this metric registry as a map of metric name to metric instance. + * Only metrics that can be recorded once for entire request lifecycle are recorded here. + * + * The method does not return the Api Call Attempt metrics. For metrics recorded separately for each attempt, + * see {@link #apiCallAttemptMetrics()}. + */ + Map getMetrics(); + + + /** + * Return an ordered list of {@link MetricRegistry} instances recorded for each Api Call Attempt in the request execution. + * Each Api call attempt metrics are recorded as a separate {@link MetricRegistry} instance in the given list. + * + * For example, + * If the Api finishes (succeed or fail) in the first attempt, the returned list size will be 1. + * + * If the Api finishes after 4 attempts (1 initial attempt + 3 retries), the returned list size will be 4. In this case, + * The 0th entry in the list has the metrics for the initial attempt, + * The 1st entry in the list has the metrics for the second attempt (1st retry) and so on. + * + * @return an ordered list of {@link MetricRegistry} instances, one for each Api Call Attempt in the request execution + */ + List apiCallAttemptMetrics(); + + /** + * Create and return a new instance of {@link MetricRegistry} for the current ApiCall Attempt. + * Records the registry instance within the class. The instance for the current attempt can be accessed by calling + * the {@link #apiCallAttemptMetrics()} method and getting the last element in the output list. + * + * If the Api Call finishes in the first attempt, this method is only called once. + * If the Api Call finishes after n retry attmpts, this method is called n + 1 times + * (1 time for initial attempt, n times for n retries) + * + * @return a instance of {@link MetricRegistry} to record metrics for a ApiCall Attempt + */ + MetricRegistry registerApiCallAttemptMetrics(); + + /** + * Given a {@link Metric}, registers it under the given name. + * If a metric with given name is already present, method throws {@link IllegalArgumentException}. + * + * @param name the name of the metric + * @param metric the metric + * @return the given metric + */ + Metric register(String name, Metric metric); + + /** + * Returns an optional representing the metric registered with the given name. If no metric is registered + * with the given name, an empty optional will be returned. + * + * @param name the name of the metric + * @return an optional representing the metric registered with the given name. + */ + Optional metric(String name); + + /** + * Removes the metric with the given name. + * + * @param name the name of the metric + * @return True if the metric was removed. False is the metric doesn't exist or cannot be removed + */ + boolean remove(String name); + + /** + * Return the {@link Counter} registered under this name. + * If there is none registered already, create and register a new {@link Counter}. + * + * @param name name of the metric + * @return a new or pre-existing {@link Counter} + */ + Counter counter(String name); + + /** + * Return the {@link Timer} registered under this name. + * If there is none registered already, create and register a new {@link Timer}. + * + * @param name name of the metric + * @return a new or pre-existing {@link Timer} + */ + Timer timer(String name); + + /** + * Return a {@link Gauge} registered under this name and updates its value with #value. + * If there is none registered already, create and register a new {@link Gauge} with the given initial #value. + * + * @param name name of the metric + * @param value initial value of the guage + * @param type of the value + * @return a new or pre-existing {@link Gauge} with updated value + */ + Gauge gauge(String name, T value); +}