Skip to content

Commit c86296e

Browse files
authored
chore(glue-alpha): fix typos, inconsistencies, docs (#33047)
fixing some fast-follow items from the recent glue PR that was merged. there is more work to be done specifically around the README but I see this as the minimum amount of changes to make glue-alpha somewhat consistent with the rest of the modules we offer in cdk, alpha or not ---- *By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license*
1 parent ce2fb92 commit c86296e

20 files changed

+244
-384
lines changed

packages/@aws-cdk/aws-glue-alpha/README.md

+38-51
Original file line numberDiff line numberDiff line change
@@ -24,19 +24,6 @@ service that makes it easier to discover, prepare, move, and integrate data
2424
from multiple sources for analytics, machine learning (ML), and application
2525
development.
2626

27-
Without an L2 construct, developers define Glue data sources, connections,
28-
jobs, and workflows for their data and ETL solutions via the AWS console,
29-
the AWS CLI, and Infrastructure as Code tools like CloudFormation and the
30-
CDK. However, there are several challenges to defining Glue resources at
31-
scale that an L2 construct can resolve. First, developers must reference
32-
documentation to determine the valid combinations of job type, Glue version,
33-
worker type, language versions, and other parameters that are required for specific
34-
job types. Additionally, developers must already know or look up the
35-
networking constraints for data source connections, and there is ambiguity
36-
around how to securely store secrets for JDBC connections. Finally,
37-
developers want prescriptive guidance via best practice defaults for
38-
throughput parameters like number of workers and batching.
39-
4027
The Glue L2 construct has convenience methods working backwards from common
4128
use cases and sets required parameters to defaults that align with recommended
4229
best practices for each job type. It also provides customers with a balance
@@ -122,25 +109,25 @@ declare const stack: cdk.Stack;
122109
declare const role: iam.IRole;
123110
declare const script: glue.Code;
124111
new glue.PySparkEtlJob(stack, 'PySparkETLJob', {
125-
jobName: 'PySparkETLJobCustomName',
126-
description: 'This is a description',
127-
role,
128-
script,
129-
glueVersion: glue.GlueVersion.V3_0,
130-
continuousLogging: { enabled: false },
131-
workerType: glue.WorkerType.G_2X,
132-
maxConcurrentRuns: 100,
133-
timeout: cdk.Duration.hours(2),
134-
connections: [glue.Connection.fromConnectionName(stack, 'Connection', 'connectionName')],
135-
securityConfiguration: glue.SecurityConfiguration.fromSecurityConfigurationName(stack, 'SecurityConfig', 'securityConfigName'),
136-
tags: {
137-
FirstTagName: 'FirstTagValue',
138-
SecondTagName: 'SecondTagValue',
139-
XTagName: 'XTagValue',
140-
},
141-
numberOfWorkers: 2,
142-
maxRetries: 2,
143-
});
112+
jobName: 'PySparkETLJobCustomName',
113+
description: 'This is a description',
114+
role,
115+
script,
116+
glueVersion: glue.GlueVersion.V3_0,
117+
continuousLogging: { enabled: false },
118+
workerType: glue.WorkerType.G_2X,
119+
maxConcurrentRuns: 100,
120+
timeout: cdk.Duration.hours(2),
121+
connections: [glue.Connection.fromConnectionName(stack, 'Connection', 'connectionName')],
122+
securityConfiguration: glue.SecurityConfiguration.fromSecurityConfigurationName(stack, 'SecurityConfig', 'securityConfigName'),
123+
tags: {
124+
FirstTagName: 'FirstTagValue',
125+
SecondTagName: 'SecondTagValue',
126+
XTagName: 'XTagValue',
127+
},
128+
numberOfWorkers: 2,
129+
maxRetries: 2,
130+
});
144131
```
145132

146133
**Streaming Jobs**
@@ -369,11 +356,11 @@ declare const stack: cdk.Stack;
369356
declare const role: iam.IRole;
370357
declare const script: glue.Code;
371358
new glue.PySparkEtlJob(stack, 'PySparkETLJob', {
372-
role,
373-
script,
374-
jobName: 'PySparkETLJob',
375-
jobRunQueuingEnabled: true
376-
});
359+
role,
360+
script,
361+
jobName: 'PySparkETLJob',
362+
jobRunQueuingEnabled: true
363+
});
377364
```
378365

379366
### Uploading scripts from the CDK app repository to S3
@@ -679,20 +666,20 @@ If you have a table with a large number of partitions that grows over time, cons
679666
```ts
680667
declare const myDatabase: glue.Database;
681668
new glue.S3Table(this, 'MyTable', {
682-
database: myDatabase,
683-
columns: [{
684-
name: 'col1',
685-
type: glue.Schema.STRING,
686-
}],
687-
partitionKeys: [{
688-
name: 'year',
689-
type: glue.Schema.SMALL_INT,
690-
}, {
691-
name: 'month',
692-
type: glue.Schema.SMALL_INT,
693-
}],
694-
dataFormat: glue.DataFormat.JSON,
695-
enablePartitionFiltering: true,
669+
database: myDatabase,
670+
columns: [{
671+
name: 'col1',
672+
type: glue.Schema.STRING,
673+
}],
674+
partitionKeys: [{
675+
name: 'year',
676+
type: glue.Schema.SMALL_INT,
677+
}, {
678+
name: 'month',
679+
type: glue.Schema.SMALL_INT,
680+
}],
681+
dataFormat: glue.DataFormat.JSON,
682+
enablePartitionFiltering: true,
696683
});
697684
```
698685

packages/@aws-cdk/aws-glue-alpha/lib/code.ts

-1
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,6 @@ import * as constructs from 'constructs';
1010
* Represents a Glue Job's Code assets (an asset can be a scripts, a jar, a python file or any other file).
1111
*/
1212
export abstract class Code {
13-
1413
/**
1514
* Job code as an S3 object.
1615
* @param bucket The S3 bucket

packages/@aws-cdk/aws-glue-alpha/lib/connection.ts

-1
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,6 @@ import { CfnConnection } from 'aws-cdk-lib/aws-glue';
1212
* @see https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-properties-glue-connection-connectioninput.html#cfn-glue-connection-connectioninput-connectiontype
1313
*/
1414
export class ConnectionType {
15-
1615
/**
1716
* Designates a connection to a database through Java Database Connectivity (JDBC).
1817
*/

packages/@aws-cdk/aws-glue-alpha/lib/constants.ts

-1
Original file line numberDiff line numberDiff line change
@@ -254,7 +254,6 @@ export enum JobType {
254254
* The number of AWS Glue data processing units (DPUs) that can be allocated when this job runs. A DPU is a relative measure of processing power that consists of 4 vCPUs of compute capacity and 16 GB of memory.
255255
*/
256256
export enum MaxCapacity {
257-
258257
/**
259258
* DPU value of 1/16th
260259
*/

packages/@aws-cdk/aws-glue-alpha/lib/database.ts

-1
Original file line numberDiff line numberDiff line change
@@ -56,7 +56,6 @@ export interface DatabaseProps {
5656
* A Glue database.
5757
*/
5858
export class Database extends Resource implements IDatabase {
59-
6059
public static fromDatabaseArn(scope: Construct, id: string, databaseArn: string): IDatabase {
6160
const stack = Stack.of(scope);
6261

packages/@aws-cdk/aws-glue-alpha/lib/jobs/job.ts

+30-30
Original file line numberDiff line numberDiff line change
@@ -131,7 +131,6 @@ export interface ContinuousLoggingProps {
131131
* event-driven flow using the job.
132132
*/
133133
export abstract class JobBase extends cdk.Resource implements IJob {
134-
135134
public abstract readonly jobArn: string;
136135
public abstract readonly jobName: string;
137136
public abstract readonly grantPrincipal: iam.IPrincipal;
@@ -264,16 +263,13 @@ export abstract class JobBase extends cdk.Resource implements IJob {
264263
*
265264
* @param id construct id.
266265
* @param jobState the job state.
267-
* @private
268266
*/
269267
private metricJobStateRule(id: string, jobState: JobState): events.Rule {
270268
return this.node.tryFindChild(id) as events.Rule ?? this.onStateChange(id, jobState);
271269
}
272270

273271
/**
274272
* Returns the job arn
275-
* @param scope
276-
* @param jobName
277273
*/
278274
protected buildJobArn(scope: constructs.Construct, jobName: string) : string {
279275
return cdk.Stack.of(scope).formatArn({
@@ -308,13 +304,12 @@ export interface JobImportAttributes {
308304
* JobProperties will be used to create new Glue Jobs using this L2 Construct.
309305
*/
310306
export interface JobProperties {
311-
312307
/**
313-
* Script Code Location (required)
314-
* Script to run when the Glue job executes. Can be uploaded
315-
* from the local directory structure using fromAsset
316-
* or referenced via S3 location using fromBucket
317-
**/
308+
* Script Code Location (required)
309+
* Script to run when the Glue job executes. Can be uploaded
310+
* from the local directory structure using fromAsset
311+
* or referenced via S3 location using fromBucket
312+
*/
318313
readonly script: Code;
319314

320315
/**
@@ -326,26 +321,29 @@ export interface JobProperties {
326321
* and be granted sufficient permissions.
327322
*
328323
* @see https://docs.aws.amazon.com/glue/latest/dg/getting-started-access.html
329-
**/
324+
*/
330325
readonly role: iam.IRole;
331326

332327
/**
333328
* Name of the Glue job (optional)
334329
* Developer-specified name of the Glue job
330+
*
335331
* @default - a name is automatically generated
336-
**/
332+
*/
337333
readonly jobName?: string;
338334

339335
/**
340336
* Description (optional)
341337
* Developer-specified description of the Glue job
338+
*
342339
* @default - no value
343-
**/
340+
*/
344341
readonly description?: string;
345342

346343
/**
347344
* Number of Workers (optional)
348345
* Number of workers for Glue to use during job execution
346+
*
349347
* @default 10
350348
*/
351349
readonly numberOfWorkers?: number;
@@ -354,8 +352,9 @@ export interface JobProperties {
354352
* Worker Type (optional)
355353
* Type of Worker for Glue to use during job execution
356354
* Enum options: Standard, G_1X, G_2X, G_025X. G_4X, G_8X, Z_2X
357-
* @default G_1X
358-
**/
355+
*
356+
* @default WorkerType.G_1X
357+
*/
359358
readonly workerType?: WorkerType;
360359

361360
/**
@@ -366,7 +365,7 @@ export interface JobProperties {
366365
* you can specify is controlled by a service limit.
367366
*
368367
* @default 1
369-
**/
368+
*/
370369
readonly maxConcurrentRuns?: number;
371370

372371
/**
@@ -377,7 +376,7 @@ export interface JobProperties {
377376
* @see https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-glue-arguments.html
378377
* for a list of reserved parameters
379378
* @default - no arguments
380-
**/
379+
*/
381380
readonly defaultArguments?: { [key: string]: string };
382381

383382
/**
@@ -386,53 +385,58 @@ export interface JobProperties {
386385
* Connections are used to connect to other AWS Service or resources within a VPC.
387386
*
388387
* @default [] - no connections are added to the job
389-
**/
388+
*/
390389
readonly connections?: IConnection[];
391390

392391
/**
393392
* Max Retries (optional)
394393
* Maximum number of retry attempts Glue performs if the job fails
394+
*
395395
* @default 0
396-
**/
396+
*/
397397
readonly maxRetries?: number;
398398

399399
/**
400400
* Timeout (optional)
401401
* The maximum time that a job run can consume resources before it is
402402
* terminated and enters TIMEOUT status. Specified in minutes.
403+
*
403404
* @default 2880 (2 days for non-streaming)
404405
*
405-
**/
406+
*/
406407
readonly timeout?: cdk.Duration;
407408

408409
/**
409410
* Security Configuration (optional)
410411
* Defines the encryption options for the Glue job
412+
*
411413
* @default - no security configuration.
412-
**/
414+
*/
413415
readonly securityConfiguration?: ISecurityConfiguration;
414416

415417
/**
416418
* Tags (optional)
417-
* A list of key:value pairs of tags to apply to this Glue job resourcex
419+
* A list of key:value pairs of tags to apply to this Glue job resources
420+
*
418421
* @default {} - no tags
419-
**/
422+
*/
420423
readonly tags?: { [key: string]: string };
421424

422425
/**
423426
* Glue Version
424427
* The version of Glue to use to execute this job
428+
*
425429
* @default 3.0 for ETL
426-
**/
430+
*/
427431
readonly glueVersion?: GlueVersion;
428432

429433
/**
430434
* Enables the collection of metrics for job profiling.
431435
*
432436
* @default - no profiling metrics emitted.
433437
*
434-
* @see `--enable-metrics` at https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-glue-arguments.html
435-
**/
438+
* @see https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-glue-arguments.html
439+
*/
436440
readonly enableProfilingMetrics? :boolean;
437441

438442
/**
@@ -444,15 +448,13 @@ export interface JobProperties {
444448
* @see https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-glue-arguments.html
445449
**/
446450
readonly continuousLogging?: ContinuousLoggingProps;
447-
448451
}
449452

450453
/**
451454
* A Glue Job.
452455
* @resource AWS::Glue::Job
453456
*/
454457
export abstract class Job extends JobBase {
455-
456458
/**
457459
* Identifies an existing Glue Job from a subset of attributes that can
458460
* be referenced from within another Stack or Construct.
@@ -500,7 +502,6 @@ export abstract class Job extends JobBase {
500502
* @returns String containing the args for the continuous logging command
501503
*/
502504
public setupContinuousLogging(role: iam.IRole, props: ContinuousLoggingProps | undefined) : any {
503-
504505
// If the developer has explicitly disabled continuous logging return no args
505506
if (props && !props.enabled) {
506507
return {};
@@ -536,7 +537,6 @@ export abstract class Job extends JobBase {
536537
const s3Location = code.bind(this, this.role).s3Location;
537538
return `s3://${s3Location.bucketName}/${s3Location.objectKey}`;
538539
}
539-
540540
}
541541

542542
/**

0 commit comments

Comments
 (0)