Skip to content

Commit 7f0ee9b

Browse files
committed
Update S3 implementation (#86)
# Description - Access style is honoured/implemented - Secrets are no longer read directly but are mounted - TLS for S3 access is not yet implemented (though a non-`None` entry will affect the endpoint returned from `operator-rs`) Fixes #85.
1 parent 1d0792b commit 7f0ee9b

File tree

16 files changed

+376
-137
lines changed

16 files changed

+376
-137
lines changed

CHANGELOG.md

Lines changed: 9 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,12 +4,20 @@ All notable changes to this project will be documented in this file.
44

55
## [Unreleased]
66

7+
### Added
8+
9+
### Changed
10+
11+
- BREAKING: Use current S3 connection/bucket structs ([#86])
12+
13+
[#86]: https://github.com/stackabletech/spark-k8s-operator/pull/86
14+
715
## [0.2.0] - 2022-06-21
816

917
### Added
1018

1119
- Added new fields to govern image pull policy ([#75])
12-
- New `nodeSelector` fields for both the driver and the excutors ([#76])
20+
- New `nodeSelector` fields for both the driver and the executors ([#76])
1321
- Mirror driver pod status to the corresponding spark application ([#77])
1422

1523
[#75]: https://github.com/stackabletech/spark-k8s-operator/pull/75

Cargo.lock

Lines changed: 44 additions & 9 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

deploy/crd/sparkapplication.crd.yaml

Lines changed: 36 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -299,18 +299,51 @@ spec:
299299
inline:
300300
description: S3 connection definition as CRD.
301301
properties:
302+
accessStyle:
303+
description: "Which access style to use. Defaults to virtual hosted-style as most of the data products out there. Have a look at the official documentation on <https://docs.aws.amazon.com/AmazonS3/latest/userguide/VirtualHosting.html>"
304+
enum:
305+
- Path
306+
- VirtualHosted
307+
nullable: true
308+
type: string
309+
credentials:
310+
description: "If the S3 uses authentication you have to specify you S3 credentials. In the most cases a SecretClass providing `accessKey` and `secretKey` is sufficient."
311+
nullable: true
312+
properties:
313+
scope:
314+
description: "[Scope](https://docs.stackable.tech/secret-operator/scope.html) of the [SecretClass](https://docs.stackable.tech/secret-operator/secretclass.html)"
315+
nullable: true
316+
properties:
317+
node:
318+
default: false
319+
type: boolean
320+
pod:
321+
default: false
322+
type: boolean
323+
services:
324+
default: []
325+
items:
326+
type: string
327+
type: array
328+
type: object
329+
secretClass:
330+
description: "[SecretClass](https://docs.stackable.tech/secret-operator/secretclass.html) containing the LDAP bind credentials"
331+
type: string
332+
required:
333+
- secretClass
334+
type: object
302335
host:
336+
description: Hostname of the S3 server without any protocol or port
303337
nullable: true
304338
type: string
305339
port:
340+
description: Port the S3 server listens on. If not specified the products will determine the port to use.
306341
format: uint16
307342
minimum: 0.0
308343
nullable: true
309344
type: integer
310-
secretClass:
311-
nullable: true
312-
type: string
313345
tls:
346+
description: If you want to use TLS when talking to S3 you can enable TLS encrypted communication with this setting.
314347
nullable: true
315348
properties:
316349
verification:

deploy/helm/spark-k8s-operator/crds/crds.yaml

Lines changed: 36 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -301,18 +301,51 @@ spec:
301301
inline:
302302
description: S3 connection definition as CRD.
303303
properties:
304+
accessStyle:
305+
description: "Which access style to use. Defaults to virtual hosted-style as most of the data products out there. Have a look at the official documentation on <https://docs.aws.amazon.com/AmazonS3/latest/userguide/VirtualHosting.html>"
306+
enum:
307+
- Path
308+
- VirtualHosted
309+
nullable: true
310+
type: string
311+
credentials:
312+
description: "If the S3 uses authentication you have to specify you S3 credentials. In the most cases a SecretClass providing `accessKey` and `secretKey` is sufficient."
313+
nullable: true
314+
properties:
315+
scope:
316+
description: "[Scope](https://docs.stackable.tech/secret-operator/scope.html) of the [SecretClass](https://docs.stackable.tech/secret-operator/secretclass.html)"
317+
nullable: true
318+
properties:
319+
node:
320+
default: false
321+
type: boolean
322+
pod:
323+
default: false
324+
type: boolean
325+
services:
326+
default: []
327+
items:
328+
type: string
329+
type: array
330+
type: object
331+
secretClass:
332+
description: "[SecretClass](https://docs.stackable.tech/secret-operator/secretclass.html) containing the LDAP bind credentials"
333+
type: string
334+
required:
335+
- secretClass
336+
type: object
304337
host:
338+
description: Hostname of the S3 server without any protocol or port
305339
nullable: true
306340
type: string
307341
port:
342+
description: Port the S3 server listens on. If not specified the products will determine the port to use.
308343
format: uint16
309344
minimum: 0.0
310345
nullable: true
311346
type: integer
312-
secretClass:
313-
nullable: true
314-
type: string
315347
tls:
348+
description: If you want to use TLS when talking to S3 you can enable TLS encrypted communication with this setting.
316349
nullable: true
317350
properties:
318351
verification:

deploy/manifests/crds.yaml

Lines changed: 36 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -302,18 +302,51 @@ spec:
302302
inline:
303303
description: S3 connection definition as CRD.
304304
properties:
305+
accessStyle:
306+
description: "Which access style to use. Defaults to virtual hosted-style as most of the data products out there. Have a look at the official documentation on <https://docs.aws.amazon.com/AmazonS3/latest/userguide/VirtualHosting.html>"
307+
enum:
308+
- Path
309+
- VirtualHosted
310+
nullable: true
311+
type: string
312+
credentials:
313+
description: "If the S3 uses authentication you have to specify you S3 credentials. In the most cases a SecretClass providing `accessKey` and `secretKey` is sufficient."
314+
nullable: true
315+
properties:
316+
scope:
317+
description: "[Scope](https://docs.stackable.tech/secret-operator/scope.html) of the [SecretClass](https://docs.stackable.tech/secret-operator/secretclass.html)"
318+
nullable: true
319+
properties:
320+
node:
321+
default: false
322+
type: boolean
323+
pod:
324+
default: false
325+
type: boolean
326+
services:
327+
default: []
328+
items:
329+
type: string
330+
type: array
331+
type: object
332+
secretClass:
333+
description: "[SecretClass](https://docs.stackable.tech/secret-operator/secretclass.html) containing the LDAP bind credentials"
334+
type: string
335+
required:
336+
- secretClass
337+
type: object
305338
host:
339+
description: Hostname of the S3 server without any protocol or port
306340
nullable: true
307341
type: string
308342
port:
343+
description: Port the S3 server listens on. If not specified the products will determine the port to use.
309344
format: uint16
310345
minimum: 0.0
311346
nullable: true
312347
type: integer
313-
secretClass:
314-
nullable: true
315-
type: string
316348
tls:
349+
description: If you want to use TLS when talking to S3 you can enable TLS encrypted communication with this setting.
317350
nullable: true
318351
properties:
319352
verification:

docs/modules/ROOT/examples/example-sparkapp-s3-private.yaml

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -16,10 +16,11 @@ spec:
1616
inline:
1717
host: test-minio
1818
port: 9000
19-
secretClass: minio-credentials # <4>
19+
accessStyle: Path
20+
credentials: # <4>
21+
secretClass: s3-credentials-class
2022
sparkConf: # <5>
2123
spark.hadoop.fs.s3a.aws.credentials.provider: "org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider" # <6>
22-
spark.hadoop.fs.s3a.path.style.access: "true"
2324
spark.driver.extraClassPath: "/dependencies/jars/hadoop-aws-3.2.0.jar:/dependencies/jars/aws-java-sdk-bundle-1.11.375.jar"
2425
spark.executor.extraClassPath: "/dependencies/jars/hadoop-aws-3.2.0.jar:/dependencies/jars/aws-java-sdk-bundle-1.11.375.jar"
2526
volumes:

docs/modules/ROOT/pages/usage.adoc

Lines changed: 14 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
== Create an Apache Spark job
44

5-
If you followed the installation instructions, you should now have a Stackable Operator for Apache Spark up and running and you are ready to create your first Apache Spark kubernetes cluster.
5+
If you followed the installation instructions, you should now have a Stackable Operator for Apache Spark up and running, and you are ready to create your first Apache Spark kubernetes cluster.
66

77
The example below creates a job running on Apache Spark 3.2.1, using the spark-on-kubernetes paradigm described in the spark documentation. The application file is itself part of the spark distribution and `local` refers to the path on the driver/executors; there are no external dependencies.
88

@@ -64,11 +64,11 @@ include::example$example-sparkapp-external-dependencies.yaml[]
6464
include::example$example-sparkapp-image.yaml[]
6565
----
6666

67-
<1> Job image: this contains the job artifact that will retrieved from the volume mount backed by the PVC
67+
<1> Job image: this contains the job artifact that will be retrieved from the volume mount backed by the PVC
6868
<2> Job python artifact (local)
6969
<3> Job argument (external)
7070
<4> List of python job requirements: these will be installed in the pods via `pip`
71-
<5> Spark dependencies: the credentials provider (the user knows what is relevant here) plus dependencies needed to access external resources (in this case, in s3)
71+
<5> Spark dependencies: the credentials provider (the user knows what is relevant here) plus dependencies needed to access external resources (in this case, in an S3 store)
7272
<6> the name of the volume mount backed by a `PersistentVolumeClaim` that must be pre-existing
7373
<7> the path on the volume mount: this is referenced in the `sparkConf` section where the extra class path is defined for the driver and executors
7474

@@ -81,7 +81,7 @@ include::example$example-sparkapp-pvc.yaml[]
8181

8282
<1> Job artifact located on S3.
8383
<2> Job main class
84-
<3> Spark dependencies: the credentials provider (the user knows what is relevant here) plus dependencies needed to access external resources (in this case, in s3, accessed without credentials)
84+
<3> Spark dependencies: the credentials provider (the user knows what is relevant here) plus dependencies needed to access external resources (in this case, in an S3 store, accessed without credentials)
8585
<4> the name of the volume mount backed by a `PersistentVolumeClaim` that must be pre-existing
8686
<5> the path on the volume mount: this is referenced in the `sparkConf` section where the extra class path is defined for the driver and executors
8787

@@ -92,12 +92,12 @@ include::example$example-sparkapp-pvc.yaml[]
9292
include::example$example-sparkapp-s3-private.yaml[]
9393
----
9494

95-
<1> Job python artifact (located in S3)
95+
<1> Job python artifact (located in an S3 store)
9696
<2> Artifact class
9797
<3> S3 section, specifying the existing secret and S3 end-point (in this case, MinIO)
98-
<4> Credentials secret
98+
<4> Credentials referencing a secretClass (not shown in is example)
9999
<5> Spark dependencies: the credentials provider (the user knows what is relevant here) plus dependencies needed to access external resources...
100-
<6> ...in this case, in s3, accessed with the credentials defined in the secret
100+
<6> ...in this case, in an S3 store, accessed with the credentials defined in the secret
101101
<7> the name of the volume mount backed by a `PersistentVolumeClaim` that must be pre-existing
102102
<8> the path on the volume mount: this is referenced in the `sparkConf` section where the extra class path is defined for the driver and executors
103103

@@ -121,7 +121,7 @@ include::example$example-sparkapp-configmap.yaml[]
121121

122122
== S3 bucket specification
123123

124-
You can specify S3 connection details directly inside the `SparkApplication` specification or by refering to an external `S3Bucket` custom resource.
124+
You can specify S3 connection details directly inside the `SparkApplication` specification or by referring to an external `S3Bucket` custom resource.
125125

126126
To specify S3 connection details directly as part of the `SparkApplication` resource you add an inline bucket configuration as shown below.
127127

@@ -134,7 +134,9 @@ s3bucket: # <1>
134134
inline:
135135
host: test-minio # <3>
136136
port: 9000 # <4>
137-
secretClass: minio-credentials # <5>
137+
accessStyle: Path
138+
credentials:
139+
secretClass: s3-credentials-class # <5>
138140
----
139141
<1> Entry point for the bucket configuration.
140142
<2> Bucket name.
@@ -166,7 +168,9 @@ spec:
166168
inline:
167169
host: test-minio
168170
port: 9000
169-
secretClass: minio-credentials
171+
accessStyle: Path
172+
credentials:
173+
secretClass: minio-credentials-class
170174
----
171175

172176
This has the advantage that bucket configuration can be shared across `SparkApplication`s and reduces the cost of updating these details.

rust/crd/Cargo.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ repository = "https://github.com/stackabletech/spark-k8s-operator"
88
version = "0.3.0-nightly"
99

1010
[dependencies]
11-
stackable-operator = { git = "https://github.com/stackabletech/operator-rs.git", tag="0.19.0" }
11+
stackable-operator = { git = "https://github.com/stackabletech/operator-rs.git", tag="0.21.0" }
1212

1313
semver = "1.0"
1414
serde = { version = "1.0", features = ["derive"] }

rust/crd/src/constants.rs

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,6 @@ pub const CONTAINER_NAME_DRIVER: &str = "spark-driver";
1717
pub const CONTAINER_IMAGE_NAME_EXECUTOR: &str = "dummy-overwritten-by-command-line";
1818
pub const CONTAINER_NAME_EXECUTOR: &str = "spark-executor";
1919

20-
pub const ENV_AWS_ACCESS_KEY_ID: &str = "AWS_ACCESS_KEY_ID";
21-
pub const ENV_AWS_SECRET_ACCESS_KEY: &str = "AWS_SECRET_ACCESS_KEY";
2220
pub const ACCESS_KEY_ID: &str = "accessKeyId";
2321
pub const SECRET_ACCESS_KEY: &str = "secretAccessKey";
22+
pub const S3_SECRET_DIR_NAME: &str = "/stackable/secrets";

0 commit comments

Comments
 (0)