Skip to content

Commit 776d70c

Browse files
committed
Add ReadMe with tenets, metrics list to collect, design doc. Added prototype interfaces
1 parent 77a1776 commit 776d70c

9 files changed

+730
-0
lines changed

docs/design/core/metrics/Design.md

Lines changed: 206 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,206 @@
1+
## Concepts
2+
### Metric
3+
* A representation of data collected
4+
* Metric can be one of the following types: Counter, Gauge, Timer
5+
* Metric can have tags. A Tag represent the category it belongs to (like Default, HttpClient, Streaming etc)
6+
7+
### MetricRegistry
8+
9+
* A MetricRegistry represent an interface to store the collected metric data. It can hold different types of Metrics described above
10+
* MetricRegistry is generic and not tied to specific category (ApiCall, HttpClient etc) of metrics.
11+
* Each API has it own instance of the MetricRegistry. All metrics collected in the ApiCall lifecycle are stored in that instance.
12+
* A MetricRegistry can store other instances of same type. This can be used to store metrics for each Attempt in an Api Call.
13+
* [Interface prototype](prototype/MetricRegistry.java)
14+
15+
### MetricPublisher
16+
17+
* A MetricPublisher represent an interface to publish the collected metrics to a external source.
18+
* SDK provides implementations to publish metrics to services like [Amazon CloudWatch](https://aws.amazon.com/cloudwatch/), [Client Side Monitoring](https://docs.aws.amazon.com/sdk-for-java/v1/developer-guide/sdk-metrics.html) (also known as AWS SDK Metrics for Enterprise Support)
19+
* Customers can implement the interface and register the custom implementation to publish metrics to a platform not supported in the SDK.
20+
* MetricPublishers can have different behaviors in terms of list of metrics to publish, publishing frequency, configuration needed to publish etc.
21+
* Metrics can be explicitly published to the platform by calling publish() method. This can be useful in scenarios when the application fails
22+
and customer wants to flush metrics before exiting the application.
23+
* [Interface prototype](prototype/MetricPublisher.java)
24+
25+
### Reporting
26+
27+
* Reporting is transferring the collected metrics to Publishers.
28+
* To report metrics to a publisher, call the registerMetrics(MetricRegistry) method on the MetricPublisher.
29+
* There is no requirement for Publisher to publish the reported metrics immediately after calling this method.
30+
31+
32+
## Enabling Metrics
33+
34+
Metrics feature is disabled by default. Metrics can be enabled at client level in the following ways.
35+
36+
### Feature Flags (Metrics Provider)
37+
38+
* SDK exposes an [interface](prototype/MetricConfigurationProvider.java) to enable the metrics feature and specify options to configure the metrics behavior.
39+
* SDK provides an implementation of this interface based on system properties.
40+
* Here are the system properties SDK supports:
41+
- **aws.javasdk2x.metrics.enabled** - Metrics feature is enabled if this system property is set
42+
- **aws.javasdk2x.metrics.category** - Comma separated set of MetricCategory that are enabled for collection
43+
* SDK calls the methods in this interface for each request ie, enabled() method is called for every request to determine if the metrics
44+
feature is enabled or not (similarly for other configuration options).
45+
* This helps customers to provide MetricConfigurationProvider implementations that uses external sources like DynamoDB to control metrics feature.
46+
This is useful to enable/disable metrics feature and control metrics options at runtime without the need to make code changes or re-deploy the application.
47+
* As the interface methods are called for each request, it is recommended for the implementations to run expensive tasks asynchronously in the background,
48+
cache the results and periodically refresh the results.
49+
50+
```
51+
ClientOverrideConfiguration config = ClientOverrideConfiguration
52+
.builder()
53+
// If this is not set, SDK uses the default chain with system property
54+
.metricConfigurationProvider(new SystemSettingsMetricConfigurationProvider())
55+
.build();
56+
57+
// Set the ClientOverrideConfiguration instance on the client builder
58+
CodePipelineAsyncClient asyncClient =
59+
CodePipelineAsyncClient
60+
.builder()
61+
.overrideConfiguration(config)
62+
.build();
63+
```
64+
65+
### Metrics Provider Chain
66+
67+
* Customers might want to have different ways of enabling the metrics feature. For example: use SystemProperties by default. If not use implementation based on Amazon DynamoDB.
68+
* To support multiple providers, SDK allows setting chain of providers (similar to the CredentialsProviderChain to resolve credentials). As provider has multiple
69+
configuration options, a single provider is resolved at chain construction time and it is used throughout the lifecycle of the application to keep the behavior intuitive.
70+
* If no custom chain is provided, SDK will use a default chain while looks for the System properties defined in above section.
71+
SDK can add more providers in the default chain in the future without breaking customers.
72+
73+
```
74+
MetricConfigurationProvider chain = new MetricConfigurationProviderChain(
75+
new SystemSettingsMetricConfigurationProvider(),
76+
// example custom implementation (not provided by the SDK)
77+
DynamoDBMetricConfigurationProvider.builder()
78+
.tableName(TABLE_NAME)
79+
.enabledKey(ENABLE_KEY_NAME)
80+
...
81+
.build(),
82+
);
83+
84+
ClientOverrideConfiguration config = ClientOverrideConfiguration
85+
.builder()
86+
// If this is not set, SDK uses the default chain with system property
87+
.metricConfigurationProvider(chain)
88+
.build();
89+
90+
// Set the ClientOverrideConfiguration instance on the client builder
91+
CodePipelineAsyncClient asyncClient =
92+
CodePipelineAsyncClient
93+
.builder()
94+
.overrideConfiguration(config)
95+
.build();
96+
```
97+
98+
### Metric Publishers Configuration
99+
100+
* If metrics are enabled, SDK by default uses a single publisher that uploads metrics to CloudWatch using default credentials and region.
101+
* Customers might want to use different configuration for the CloudWatch publisher or even use a different publisher to publish to a different source.
102+
To provide this flexibility, SDK exposes an option to set [MetricPublisherConfiguration](prototype/MetricPublisherConfiguration.java) which can be
103+
used to configure custom publishers.
104+
* SDK publishes the collected metrics to each of the configured publishers in the MetricPublisherConfiguration.
105+
106+
```
107+
ClientOverrideConfiguration config = ClientOverrideConfiguration
108+
.builder()
109+
.metricPublisherConfiguration(MetricPublisherConfiguration
110+
.builder()
111+
.addPublisher(
112+
CloudWatchPublisher.builder()
113+
.credentialsProvider(...)
114+
.region(Region.AP_SOUTH_1)
115+
.publishFrequency(5, TimeUnit.MINUTES)
116+
.build(),
117+
CsmPublisher.create()).bu
118+
.build())
119+
.build();
120+
121+
// Set the ClientOverrideConfiguration instance on the client builder
122+
CodePipelineAsyncClient asyncClient =
123+
CodePipelineAsyncClient
124+
.builder()
125+
.overrideConfiguration(config)
126+
.build();
127+
```
128+
129+
130+
## Modules
131+
New modules are created to support metrics feature.
132+
133+
### metrics-spi
134+
* Contains the metrics interfaces and default implementations that don't require other dependencies
135+
* This is a sub module under `core`
136+
* `sdk-core` has a dependency on `metrics-spi`, so customers will automatically get a dependency on this module.
137+
138+
### metrics-publishers
139+
* This is a new module that contains implementations of all SDK supported publishers
140+
* Under this module, a new sub-module is created for each publisher (`cloudwatch-publisher`, `csm-publisher`)
141+
* Customers has to **explicitly add dependency** on these modules to use the sdk provided publishers
142+
143+
144+
## Sequence Diagram
145+
146+
![Normal API Call flow](images/MetricsSequenceDiagram.png)
147+
148+
1. Client enables metrics feature through MetricConfigurationProvider and configure publishers through MetricPublisherConfiguration.
149+
2. For each API call, a new MetricRegistry object is created and stored in the ExecutionAttributes. If metrics are not enabled, a NoOpMetricRegistry is used.
150+
3. At each metric collection point, the metric is registered in the MetricRegistry object if its category is enabled in MetricConfigurationProvider.
151+
4. The metrics that are collected once for a Api Call execution are stored in the METRIC_REGISTRY ExecutionAttribute.
152+
5. The metrics that are collected per Api Call attempt are stored in new MetricRegistry instances which are part of the ApiCall MetricRegistry.
153+
These MetricRegistry instance for the current attempt is also accessed through ATTEMPT_METRIC_REGISTRY ExecutionAttribute.
154+
6. At end of API call, report the MetricRegistry object to MetricPublishers by calling registerMetrics(MetricRegistry) method. This is done in an ExecutionInterceptor.
155+
7. Steps 2 to 6 are repeated for each API call
156+
8. MetricPublisher calls publish() method to report metrics to external sources. The frequency of publish() method call is unique to Publisher implementation.
157+
9. Client has access to all registered publishers and it can call publish() method explicitly if desired.
158+
159+
160+
## Implementation Details
161+
Few important implementation details are discussed in this section.
162+
163+
SDK modules can be organized as shown in this image.
164+
![Module Hierarchy](images/MetricsModulesHierarchy.png)
165+
166+
* Core modules - Modules in the core directory while have access to ExecutionContext and ExecutionAttributes
167+
* Downstream modules - Modules where execution occurs after core modules. For example, http-clients is downstream module as the request is transferred from core to http client for further execution.
168+
* Upstream modules - Modules that live in layers above core. Examples are High Level libraries (HLL) or Applications that use SDK. Execution goes from Upstream modules to core modules.
169+
170+
### Core Modules
171+
* SDK will use ExecutionAttributes to pass the MetricConfigurationProvider information through out the core module where core request-response metrics are collected.
172+
* Instead of checking whether metrics is enabled at each metric collection point, SDK will use the instance of NoOpMetricRegistry (if metrics are disabled) and DefaultMetricRegistry (if metrics are enabled).
173+
* The NoOpMetricRegistry class does not collect or store any metric data. Instead of creating a new NoOpMetricRegistry instance for each request, use the same instance for every request to avoid additional object creation.
174+
* The DefaultMetricRegistry class will only collect metrics if they belong to the MetricCategory list provided in the MetricConfigurationProvider. To support this, DefaultMetricRegistry is decorated by
175+
another class to filter metric categories that are not set in MetricConfigurationProvider.
176+
177+
### Downstream Modules
178+
* The MetricRegistry object and other required metric configuration details will be passed to the classes in downstream modules.
179+
* For example, HttpExecuteRequest for sync http client, AsyncExecuteRequest for async http client.
180+
* Downstream modules record the metric data directly into the given MetricRegistry object.
181+
* As we use same MetricRegistry object for core and downstream modules, both metrics will be reported to the Publisher together.
182+
183+
### Upstream Modules
184+
* As MetricRegistry object is created after the execution is passed from Upstream modules, these modules won't be able to modify/add to the core metrics.
185+
* If upstream modules want to report additional metrics using the registered publishers, they would need to create MetricRegistry instances and explicitly call the methods on the Publishers.
186+
* It would be useful to get the low-level API metrics in these modules, so SDK will expose APIs to get an immutable version of the
187+
MetricRegistry object so that upstream classes can use that information in their metric calculation.
188+
189+
190+
## Testing
191+
192+
One of the main tenet for metrics is “Enabling default metrics should have minimal impact on the application performance.“
193+
To ensure this, performance tests should be written and a baseline for overhead should be created.
194+
These tests should be run regularly to catch regressions.
195+
196+
### Test Cases
197+
198+
SDK will be tested under load for each of these test cases using the load testing framework we already have.
199+
Each of these test case results should be compared with metrics feature disabled & enabled, and then comparing the results.
200+
201+
1. Enable each metrics publisher (CloudWatch, CSM) individually.
202+
2. Enable all metrics publishers.
203+
3. Individually enable each metric category to find overhead for each MetricCategory.
204+
205+
206+

0 commit comments

Comments
 (0)