Skip to content

Commit 1ca3e00

Browse files
authored
feat(redshift): column compression encodings and comments can now be customised (#24177)
In accordance with #24165, I'm opening the same pull request as before. Not sure if my previous PR #23597 will automatically be "re-merged" in, but if not, then you can review this pull request Will AGAIN close #22506 ---- *By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license*
1 parent 1b2014e commit 1ca3e00

File tree

35 files changed

+2085
-1306
lines changed

35 files changed

+2085
-1306
lines changed

packages/@aws-cdk/aws-redshift/README.md

+31-6
Original file line numberDiff line numberDiff line change
@@ -200,17 +200,32 @@ new Table(this, 'Table', {
200200
});
201201
```
202202

203-
Tables can also be configured with a comment:
203+
Tables and their respective columns can be configured to contain comments:
204204

205205
```ts fixture=cluster
206206
new Table(this, 'Table', {
207207
tableColumns: [
208-
{ name: 'col1', dataType: 'varchar(4)' },
209-
{ name: 'col2', dataType: 'float' }
208+
{ name: 'col1', dataType: 'varchar(4)', comment: 'This is a column comment' },
209+
{ name: 'col2', dataType: 'float', comment: 'This is a another column comment' }
210+
],
211+
cluster: cluster,
212+
databaseName: 'databaseName',
213+
tableComment: 'This is a table comment',
214+
});
215+
```
216+
217+
Table columns can be configured to use a specific compression encoding:
218+
219+
```ts fixture=cluster
220+
import { ColumnEncoding } from '@aws-cdk/aws-redshift';
221+
222+
new Table(this, 'Table', {
223+
tableColumns: [
224+
{ name: 'col1', dataType: 'varchar(4)', encoding: ColumnEncoding.TEXT32K },
225+
{ name: 'col2', dataType: 'float', encoding: ColumnEncoding.DELTA32K },
210226
],
211227
cluster: cluster,
212228
databaseName: 'databaseName',
213-
comment: 'This is a comment',
214229
});
215230
```
216231

@@ -369,6 +384,8 @@ cluster.addToParameterGroup('enable_user_activity_logging', 'true');
369384
In most cases, existing clusters [must be manually rebooted](https://docs.aws.amazon.com/redshift/latest/mgmt/working-with-parameter-groups.html) to apply parameter changes. You can automate parameter related reboots by setting the cluster's `rebootForParameterChanges` property to `true` , or by using `Cluster.enableRebootForParameterChanges()`.
370385

371386
```ts
387+
import * as ec2 from '@aws-cdk/aws-ec2';
388+
import * as cdk from '@aws-cdk/core';
372389
declare const vpc: ec2.Vpc;
373390

374391
const cluster = new Cluster(this, 'Cluster', {
@@ -451,14 +468,16 @@ Some Amazon Redshift features require Amazon Redshift to access other AWS servic
451468
When you create an IAM role and set it as the default for the cluster using console, you don't have to provide the IAM role's Amazon Resource Name (ARN) to perform authentication and authorization.
452469

453470
```ts
471+
import * as ec2 from '@aws-cdk/aws-ec2';
472+
import * as iam from '@aws-cdk/aws-iam';
454473
declare const vpc: ec2.Vpc;
455474

456475
const defaultRole = new iam.Role(this, 'DefaultRole', {
457476
assumedBy: new iam.ServicePrincipal('redshift.amazonaws.com'),
458477
},
459478
);
460479

461-
new Cluster(stack, 'Redshift', {
480+
new Cluster(this, 'Redshift', {
462481
masterUser: {
463482
masterUsername: 'admin',
464483
},
@@ -471,14 +490,16 @@ new Cluster(stack, 'Redshift', {
471490
A default role can also be added to a cluster using the `addDefaultIamRole` method.
472491

473492
```ts
493+
import * as ec2 from '@aws-cdk/aws-ec2';
494+
import * as iam from '@aws-cdk/aws-iam';
474495
declare const vpc: ec2.Vpc;
475496

476497
const defaultRole = new iam.Role(this, 'DefaultRole', {
477498
assumedBy: new iam.ServicePrincipal('redshift.amazonaws.com'),
478499
},
479500
);
480501

481-
const redshiftCluster = new Cluster(stack, 'Redshift', {
502+
const redshiftCluster = new Cluster(this, 'Redshift', {
482503
masterUser: {
483504
masterUsername: 'admin',
484505
},
@@ -494,6 +515,8 @@ redshiftCluster.addDefaultIamRole(defaultRole);
494515
Attaching IAM roles to a Redshift Cluster grants permissions to the Redshift service to perform actions on your behalf.
495516

496517
```ts
518+
import * as ec2 from '@aws-cdk/aws-ec2';
519+
import * as iam from '@aws-cdk/aws-iam';
497520
declare const vpc: ec2.Vpc
498521

499522
const role = new iam.Role(this, 'Role', {
@@ -511,6 +534,8 @@ const cluster = new Cluster(this, 'Redshift', {
511534
Additional IAM roles can be attached to a cluster using the `addIamRole` method.
512535

513536
```ts
537+
import * as ec2 from '@aws-cdk/aws-ec2';
538+
import * as iam from '@aws-cdk/aws-iam';
514539
declare const vpc: ec2.Vpc
515540

516541
const role = new iam.Role(this, 'Role', {

packages/@aws-cdk/aws-redshift/lib/private/database-query-provider/table.ts

+27-1
Original file line numberDiff line numberDiff line change
@@ -42,7 +42,7 @@ async function createTable(
4242
tableAndClusterProps: TableAndClusterProps,
4343
): Promise<string> {
4444
const tableName = tableNamePrefix + tableNameSuffix;
45-
const tableColumnsString = tableColumns.map(column => `${column.name} ${column.dataType}`).join();
45+
const tableColumnsString = tableColumns.map(column => `${column.name} ${column.dataType}${getEncodingColumnString(column)}`).join();
4646

4747
let statement = `CREATE TABLE ${tableName} (${tableColumnsString})`;
4848

@@ -63,6 +63,11 @@ async function createTable(
6363

6464
await executeStatement(statement, tableAndClusterProps);
6565

66+
for (const column of tableColumns) {
67+
if (column.comment) {
68+
await executeStatement(`COMMENT ON COLUMN ${tableName}.${column.name} IS '${column.comment}'`, tableAndClusterProps);
69+
}
70+
}
6671
if (tableAndClusterProps.tableComment) {
6772
await executeStatement(`COMMENT ON TABLE ${tableName} IS '${tableAndClusterProps.tableComment}'`, tableAndClusterProps);
6873
}
@@ -120,6 +125,20 @@ async function updateTable(
120125
alterationStatements.push(...columnAdditions.map(addition => `ALTER TABLE ${tableName} ${addition}`));
121126
}
122127

128+
const columnEncoding = tableColumns.filter(column => {
129+
return oldTableColumns.some(oldColumn => column.name === oldColumn.name && column.encoding !== oldColumn.encoding);
130+
}).map(column => `ALTER COLUMN ${column.name} ENCODE ${column.encoding || 'AUTO'}`);
131+
if (columnEncoding.length > 0) {
132+
alterationStatements.push(`ALTER TABLE ${tableName} ${columnEncoding.join(', ')}`);
133+
}
134+
135+
const columnComments = tableColumns.filter(column => {
136+
return oldTableColumns.some(oldColumn => column.name === oldColumn.name && column.comment !== oldColumn.comment);
137+
}).map(column => `COMMENT ON COLUMN ${tableName}.${column.name} IS ${column.comment ? `'${column.comment}'` : 'NULL'}`);
138+
if (columnComments.length > 0) {
139+
alterationStatements.push(...columnComments);
140+
}
141+
123142
if (useColumnIds) {
124143
const columnNameUpdates = tableColumns.reduce((updates, column) => {
125144
const oldColumn = oldTableColumns.find(oldCol => oldCol.id && oldCol.id === column.id);
@@ -190,3 +209,10 @@ async function updateTable(
190209
function getSortKeyColumnsString(sortKeyColumns: Column[]) {
191210
return sortKeyColumns.map(column => column.name).join();
192211
}
212+
213+
function getEncodingColumnString(column: Column): string {
214+
if (column.encoding) {
215+
return ` ENCODE ${column.encoding}`;
216+
}
217+
return '';
218+
}

packages/@aws-cdk/aws-redshift/lib/table.ts

+123
Original file line numberDiff line numberDiff line change
@@ -90,6 +90,20 @@ export interface Column {
9090
* @default - column is not a SORTKEY
9191
*/
9292
readonly sortKey?: boolean;
93+
94+
/**
95+
* The encoding to use for the column.
96+
*
97+
* @default - Amazon Redshift determines the encoding based on the data type.
98+
*/
99+
readonly encoding?: ColumnEncoding;
100+
101+
/**
102+
* A comment to attach to the column.
103+
*
104+
* @default - no comment
105+
*/
106+
readonly comment?: string;
93107
}
94108

95109
/**
@@ -371,3 +385,112 @@ export enum TableSortStyle {
371385
*/
372386
INTERLEAVED = 'INTERLEAVED',
373387
}
388+
389+
/**
390+
* The compression encoding of a column.
391+
*
392+
* @see https://docs.aws.amazon.com/redshift/latest/dg/c_Compression_encodings.html
393+
*/
394+
export enum ColumnEncoding {
395+
/**
396+
* Amazon Redshift assigns an optimal encoding based on the column data.
397+
* This is the default.
398+
*/
399+
AUTO = 'AUTO',
400+
401+
/**
402+
* The column is not compressed.
403+
*
404+
* @see https://docs.aws.amazon.com/redshift/latest/dg/c_Raw_encoding.html
405+
*/
406+
RAW = 'RAW',
407+
408+
/**
409+
* The column is compressed using the AZ64 algorithm.
410+
*
411+
* @see https://docs.aws.amazon.com/redshift/latest/dg/az64-encoding.html
412+
*/
413+
AZ64 = 'AZ64',
414+
415+
/**
416+
* The column is compressed using a separate dictionary for each block column value on disk.
417+
*
418+
* @see https://docs.aws.amazon.com/redshift/latest/dg/c_Byte_dictionary_encoding.html
419+
*/
420+
BYTEDICT = 'BYTEDICT',
421+
422+
/**
423+
* The column is compressed based on the difference between values in the column.
424+
* This records differences as 1-byte values.
425+
*
426+
* @see https://docs.aws.amazon.com/redshift/latest/dg/c_Delta_encoding.html
427+
*/
428+
DELTA = 'DELTA',
429+
430+
/**
431+
* The column is compressed based on the difference between values in the column.
432+
* This records differences as 2-byte values.
433+
*
434+
* @see https://docs.aws.amazon.com/redshift/latest/dg/c_Delta_encoding.html
435+
*/
436+
DELTA32K = 'DELTA32K',
437+
438+
/**
439+
* The column is compressed using the LZO algorithm.
440+
*
441+
* @see https://docs.aws.amazon.com/redshift/latest/dg/lzo-encoding.html
442+
*/
443+
LZO = 'LZO',
444+
445+
/**
446+
* The column is compressed to a smaller storage size than the original data type.
447+
* The compressed storage size is 1 byte.
448+
*
449+
* @see https://docs.aws.amazon.com/redshift/latest/dg/c_MostlyN_encoding.html
450+
*/
451+
MOSTLY8 = 'MOSTLY8',
452+
453+
/**
454+
* The column is compressed to a smaller storage size than the original data type.
455+
* The compressed storage size is 2 bytes.
456+
*
457+
* @see https://docs.aws.amazon.com/redshift/latest/dg/c_MostlyN_encoding.html
458+
*/
459+
MOSTLY16 = 'MOSTLY16',
460+
461+
/**
462+
* The column is compressed to a smaller storage size than the original data type.
463+
* The compressed storage size is 4 bytes.
464+
*
465+
* @see https://docs.aws.amazon.com/redshift/latest/dg/c_MostlyN_encoding.html
466+
*/
467+
MOSTLY32 = 'MOSTLY32',
468+
469+
/**
470+
* The column is compressed by recording the number of occurrences of each value in the column.
471+
*
472+
* @see https://docs.aws.amazon.com/redshift/latest/dg/c_Runlength_encoding.html
473+
*/
474+
RUNLENGTH = 'RUNLENGTH',
475+
476+
/**
477+
* The column is compressed by recording the first 245 unique words and then using a 1-byte index to represent each word.
478+
*
479+
* @see https://docs.aws.amazon.com/redshift/latest/dg/c_Text255_encoding.html
480+
*/
481+
TEXT255 = 'TEXT255',
482+
483+
/**
484+
* The column is compressed by recording the first 32K unique words and then using a 2-byte index to represent each word.
485+
*
486+
* @see https://docs.aws.amazon.com/redshift/latest/dg/c_Text255_encoding.html
487+
*/
488+
TEXT32K = 'TEXT32K',
489+
490+
/**
491+
* The column is compressed using the ZSTD algorithm.
492+
*
493+
* @see https://docs.aws.amazon.com/redshift/latest/dg/zstd-encoding.html
494+
*/
495+
ZSTD = 'ZSTD',
496+
}

0 commit comments

Comments
 (0)