Skip to content

Commit 1a18dc9

Browse files
natalie-white-awsmjanardhanGavinZZmergify[bot]
authored
refactor(glue-alpha): Refactored glue-alpha L2 CDK construct RFC 0497 (#32521)
### Issue # (if applicable) Implementation of [RFC 0497](https://github.com/aws/aws-cdk-rfcs/blob/main/text/0497-glue-l2-construct.md) ### Reason for this change Refactored glue-alpha construct to enforce validations by contract and interfaces, improve developer experience, and adhere to best practices. [Related PR with merge conflicts and history](mjanardhan#12) ### Description of changes Refactored from a single Job class to a pattern of inheritance that removes the need for synth-time validations and sets best practice defaults. Allows for overriding language and Glue versions where applicable, and other job-type specific parameters. The existing Job and Job Executable monoliths have been decomposed into Job Type and Language specific classes that implement and extend an abstract Job parent class. Developers will be able to see mandatory and optional parameters that apply just to their selected job type and language, rather than having to reference documentation and examples or find out during synth or deploy time that they've selected the wrong configuration. BREAKING CHANGE: Developers must refactor their existing Job instantiation method calls to choose the right job type and language, and use the new constants static values to define the associated Job configuration settings. See the RFC and/or new README for examples. ### Description of how you validated changes Increased unit test coverage to > 90%, consulted with Glue service team on best practices and sane defaults, updated integration tests. ### Checklist - [X] My code adheres to the [CONTRIBUTING GUIDE](https://github.com/aws/aws-cdk/blob/main/CONTRIBUTING.md) and [DESIGN GUIDELINES](https://github.com/aws/aws-cdk/blob/main/docs/DESIGN_GUIDELINES.md) ---- *By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license* --------- Co-authored-by: Janardhan (Janny) Molumuri <[email protected]> Co-authored-by: GZ <[email protected]> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
1 parent a928748 commit 1a18dc9

File tree

137 files changed

+16005
-5162
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

137 files changed

+16005
-5162
lines changed

packages/@aws-cdk/aws-glue-alpha/README.md

Lines changed: 422 additions & 146 deletions
Large diffs are not rendered by default.

packages/@aws-cdk/aws-glue-alpha/awslint.json

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -51,15 +51,15 @@
5151
"docs-public-apis:@aws-cdk/aws-glue-alpha.ITable",
5252
"docs-public-apis:@aws-cdk/aws-glue-alpha.ITable.tableArn",
5353
"docs-public-apis:@aws-cdk/aws-glue-alpha.ITable.tableName",
54-
"props-default-doc:@aws-cdk/aws-glue-alpha.PythonRayExecutableProps.runtime",
55-
"props-default-doc:@aws-cdk/aws-glue-alpha.PythonShellExecutableProps.runtime",
56-
"props-default-doc:@aws-cdk/aws-glue-alpha.PythonSparkJobExecutableProps.runtime",
5754
"docs-public-apis:@aws-cdk/aws-glue-alpha.S3TableProps",
58-
"props-default-doc:@aws-cdk/aws-glue-alpha.ScalaJobExecutableProps.runtime",
5955
"docs-public-apis:@aws-cdk/aws-glue-alpha.TableAttributes",
6056
"docs-public-apis:@aws-cdk/aws-glue-alpha.TableAttributes.tableArn",
6157
"docs-public-apis:@aws-cdk/aws-glue-alpha.TableAttributes.tableName",
6258
"docs-public-apis:@aws-cdk/aws-glue-alpha.TableBaseProps",
63-
"docs-public-apis:@aws-cdk/aws-glue-alpha.TableProps"
59+
"docs-public-apis:@aws-cdk/aws-glue-alpha.TableProps",
60+
"docs-public-apis:@aws-cdk/aws-glue-alpha.PredicateLogical",
61+
"no-unused-type:@aws-cdk/aws-glue-alpha.ExecutionClass",
62+
"no-unused-type:@aws-cdk/aws-glue-alpha.JobLanguage",
63+
"no-unused-type:@aws-cdk/aws-glue-alpha.JobType"
6464
]
6565
}
Lines changed: 313 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,313 @@
1+
/**
2+
* The type of predefined worker that is allocated when a job runs.
3+
*
4+
* If you need to use a WorkerType that doesn't exist as a static member, you
5+
* can instantiate a `WorkerType` object, e.g: `WorkerType.of('other type')`
6+
*/
7+
export enum WorkerType {
8+
/**
9+
* Standard Worker Type
10+
* 4 vCPU, 16 GB of memory and a 50GB disk, and 2 executors per worker.
11+
*/
12+
STANDARD = 'Standard',
13+
14+
/**
15+
* G.1X Worker Type
16+
* 1 DPU (4 vCPU, 16 GB of memory, 64 GB disk), and provides 1 executor per worker. Suitable for memory-intensive jobs.
17+
*/
18+
G_1X = 'G.1X',
19+
20+
/**
21+
* G.2X Worker Type
22+
* 2 DPU (8 vCPU, 32 GB of memory, 128 GB disk), and provides 1 executor per worker. Suitable for memory-intensive jobs.
23+
*/
24+
G_2X = 'G.2X',
25+
26+
/**
27+
* G.4X Worker Type
28+
* 4 DPU (16 vCPU, 64 GB of memory, 256 GB disk), and provides 1 executor per worker.
29+
* We recommend this worker type for jobs whose workloads contain your most demanding transforms,
30+
* aggregations, joins, and queries. This worker type is available only for AWS Glue version 3.0 or later jobs.
31+
*/
32+
G_4X = 'G.4X',
33+
34+
/**
35+
* G.8X Worker Type
36+
* 8 DPU (32 vCPU, 128 GB of memory, 512 GB disk), and provides 1 executor per worker. We recommend this worker
37+
* type for jobs whose workloads contain your most demanding transforms, aggregations, joins, and queries.
38+
* This worker type is available only for AWS Glue version 3.0 or later jobs.
39+
*/
40+
G_8X = 'G.8X',
41+
42+
/**
43+
* G.025X Worker Type
44+
* 0.25 DPU (2 vCPU, 4 GB of memory, 64 GB disk), and provides 1 executor per worker. Suitable for low volume streaming jobs.
45+
*/
46+
G_025X = 'G.025X',
47+
48+
/**
49+
* Z.2X Worker Type
50+
*/
51+
Z_2X = 'Z.2X',
52+
}
53+
54+
/**
55+
* The number of workers of a defined workerType that are allocated when a job runs.
56+
*
57+
* @see https://docs.aws.amazon.com/glue/latest/dg/aws-glue-api-jobs-job.html
58+
*/
59+
60+
/**
61+
* Job states emitted by Glue to CloudWatch Events.
62+
*
63+
* @see https://docs.aws.amazon.com/AmazonCloudWatch/latest/events/EventTypes.html#glue-event-types for more information.
64+
*/
65+
export enum JobState {
66+
/**
67+
* State indicating job run succeeded
68+
*/
69+
SUCCEEDED = 'SUCCEEDED',
70+
71+
/**
72+
* State indicating job run failed
73+
*/
74+
FAILED = 'FAILED',
75+
76+
/**
77+
* State indicating job run timed out
78+
*/
79+
TIMEOUT = 'TIMEOUT',
80+
81+
/**
82+
* State indicating job is starting
83+
*/
84+
STARTING = 'STARTING',
85+
86+
/**
87+
* State indicating job is running
88+
*/
89+
RUNNING = 'RUNNING',
90+
91+
/**
92+
* State indicating job is stopping
93+
*/
94+
STOPPING = 'STOPPING',
95+
96+
/**
97+
* State indicating job stopped
98+
*/
99+
STOPPED = 'STOPPED',
100+
}
101+
102+
/**
103+
* The Glue CloudWatch metric type.
104+
*
105+
* @see https://docs.aws.amazon.com/glue/latest/dg/monitoring-awsglue-with-cloudwatch-metrics.html
106+
*/
107+
export enum MetricType {
108+
/**
109+
* A value at a point in time.
110+
*/
111+
GAUGE = 'gauge',
112+
113+
/**
114+
* An aggregate number.
115+
*/
116+
COUNT = 'count',
117+
}
118+
119+
/**
120+
* The ExecutionClass whether the job is run with a standard or flexible execution class.
121+
*
122+
* @see https://docs.aws.amazon.com/glue/latest/dg/aws-glue-api-jobs-job.html#aws-glue-api-jobs-job-Job
123+
* @see https://docs.aws.amazon.com/glue/latest/dg/add-job.html
124+
*/
125+
export enum ExecutionClass {
126+
/**
127+
* The flexible execution class is appropriate for time-insensitive jobs whose start
128+
* and completion times may vary.
129+
*/
130+
FLEX = 'FLEX',
131+
132+
/**
133+
* The standard execution class is ideal for time-sensitive workloads that require fast job
134+
* startup and dedicated resources.
135+
*/
136+
STANDARD = 'STANDARD',
137+
}
138+
139+
/**
140+
* AWS Glue version determines the versions of Apache Spark and Python that are available to the job.
141+
*
142+
* @see https://docs.aws.amazon.com/glue/latest/dg/add-job.html.
143+
*/
144+
export enum GlueVersion {
145+
/**
146+
* Glue version using Spark 2.2.1 and Python 2.7
147+
*/
148+
V0_9 = '0.9',
149+
150+
/**
151+
* Glue version using Spark 2.4.3, Python 2.7 and Python 3.6
152+
*/
153+
V1_0 = '1.0',
154+
155+
/**
156+
* Glue version using Spark 2.4.3 and Python 3.7
157+
*/
158+
V2_0 = '2.0',
159+
160+
/**
161+
* Glue version using Spark 3.1.1 and Python 3.7
162+
*/
163+
V3_0 = '3.0',
164+
165+
/**
166+
* Glue version using Spark 3.3.0 and Python 3.10
167+
*/
168+
V4_0 = '4.0',
169+
170+
/**
171+
* Glue version using Spark 3.3.0 and Python 3.10
172+
*/
173+
V5_0 = '5.0',
174+
175+
}
176+
177+
/**
178+
* Runtime language of the Glue job
179+
*/
180+
export enum JobLanguage {
181+
/**
182+
* Scala
183+
*/
184+
SCALA = 'scala',
185+
186+
/**
187+
* Python
188+
*/
189+
PYTHON = 'python',
190+
}
191+
192+
/**
193+
* Python version
194+
*/
195+
export enum PythonVersion {
196+
/**
197+
* Python 2 (the exact version depends on GlueVersion and JobCommand used)
198+
*/
199+
TWO = '2',
200+
201+
/**
202+
* Python 3 (the exact version depends on GlueVersion and JobCommand used)
203+
*/
204+
THREE = '3',
205+
206+
/**
207+
* Python 3.9 (the exact version depends on GlueVersion and JobCommand used)
208+
*/
209+
THREE_NINE = '3.9',
210+
211+
}
212+
213+
/**
214+
* AWS Glue runtime determines the runtime engine of the job.
215+
*
216+
*/
217+
export enum Runtime {
218+
/**
219+
* Runtime for a Glue for Ray 2.4.
220+
*/
221+
RAY_TWO_FOUR = 'Ray2.4',
222+
}
223+
224+
/**
225+
* The job type.
226+
*
227+
* If you need to use a JobType that doesn't exist as a static member, you
228+
* can instantiate a `JobType` object, e.g: `JobType.of('other name')`.
229+
*/
230+
export enum JobType {
231+
/**
232+
* Command for running a Glue Spark job.
233+
*/
234+
ETL = 'glueetl',
235+
236+
/**
237+
* Command for running a Glue Spark streaming job.
238+
*/
239+
STREAMING = 'gluestreaming',
240+
241+
/**
242+
* Command for running a Glue python shell job.
243+
*/
244+
PYTHON_SHELL = 'pythonshell',
245+
246+
/**
247+
* Command for running a Glue Ray job.
248+
*/
249+
RAY = 'glueray',
250+
251+
}
252+
253+
/**
254+
* The number of AWS Glue data processing units (DPUs) that can be allocated when this job runs. A DPU is a relative measure of processing power that consists of 4 vCPUs of compute capacity and 16 GB of memory.
255+
*/
256+
export enum MaxCapacity {
257+
258+
/**
259+
* DPU value of 1/16th
260+
*/
261+
DPU_1_16TH = 0.0625,
262+
263+
/**
264+
* DPU value of 1
265+
*/
266+
DPU_1 = 1,
267+
}
268+
269+
/*
270+
* Represents the logical operator for combining multiple conditions in the Glue Trigger API.
271+
*/
272+
export enum PredicateLogical {
273+
/**
274+
* All conditions must be true for the predicate to be true.
275+
*/
276+
AND = 'AND',
277+
278+
/**
279+
* At least one condition must be true for the predicate to be true.
280+
*/
281+
ANY = 'ANY',
282+
}
283+
284+
/**
285+
* Represents the logical operator for evaluating a single condition in the Glue Trigger API.
286+
*/
287+
export enum ConditionLogicalOperator {
288+
/** The condition is true if the values are equal. */
289+
EQUALS = 'EQUALS',
290+
}
291+
292+
/**
293+
* Represents the state of a crawler for a condition in the Glue Trigger API.
294+
*/
295+
export enum CrawlerState {
296+
/** The crawler is currently running. */
297+
RUNNING = 'RUNNING',
298+
299+
/** The crawler is in the process of being cancelled. */
300+
CANCELLING = 'CANCELLING',
301+
302+
/** The crawler has been cancelled. */
303+
CANCELLED = 'CANCELLED',
304+
305+
/** The crawler has completed its operation successfully. */
306+
SUCCEEDED = 'SUCCEEDED',
307+
308+
/** The crawler has failed to complete its operation. */
309+
FAILED = 'FAILED',
310+
311+
/** The crawler encountered an error during its operation. */
312+
ERROR = 'ERROR',
313+
}

packages/@aws-cdk/aws-glue-alpha/lib/index.ts

Lines changed: 13 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -6,11 +6,22 @@ export * from './data-format';
66
export * from './data-quality-ruleset';
77
export * from './database';
88
export * from './external-table';
9-
export * from './job';
10-
export * from './job-executable';
119
export * from './s3-table';
1210
export * from './schema';
1311
export * from './security-configuration';
1412
export * from './storage-parameter';
13+
export * from './constants';
14+
export * from './jobs/job';
15+
export * from './jobs/pyspark-etl-job';
16+
export * from './jobs/pyspark-flex-etl-job';
17+
export * from './jobs/pyspark-streaming-job';
18+
export * from './jobs/python-shell-job';
19+
export * from './jobs/ray-job';
20+
export * from './jobs/scala-spark-etl-job';
21+
export * from './jobs/scala-spark-flex-etl-job';
22+
export * from './jobs/scala-spark-streaming-job';
23+
export * from './jobs/spark-ui-utils';
1524
export * from './table-base';
1625
export * from './table-deprecated';
26+
export * from './triggers/workflow';
27+
export * from './triggers/trigger-options';

0 commit comments

Comments
 (0)