Skip to content

Commit 068aac7

Browse files
Andrew Kenworthyrazvanadwk67
committed
Documentation (#41)
## Description Documentation of work-to-date: - kind cluster set-up for local testing (service account, roles, PV, where to find docker images and how to load them to a local cluster) - explanation of custom resource fields (and how they map to spark-submit arguments) - a brief description of the examples - updated changelog - what has to be provided in the docker images if the user provides their own - next steps, TODOs etc. Fixes #40. Co-authored-by: Razvan-Daniel Mihai <[email protected]> Co-authored-by: Andrew Kenworthy <[email protected]>
1 parent 3a6bec9 commit 068aac7

18 files changed

+648
-50
lines changed

docs/antora.yml

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,6 @@
1-
name: spark
1+
---
2+
name: spark-k8s
23
version: master
3-
title: Stackable Operator for Apache Spark
4+
title: Stackable Operator for Apache Spark on Kubernetes
45
nav:
56
- modules/ROOT/nav.adoc
Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,19 @@
1+
---
2+
apiVersion: spark.stackable.tech/v1alpha1
3+
kind: SparkApplication
4+
metadata:
5+
name: spark-pi
6+
spec:
7+
version: "1.0"
8+
sparkImage: docker.stackable.tech/stackable/spark-k8s:3.2.1-hadoop3.2-stackable0.4.0 # <1>
9+
mode: cluster
10+
mainClass: org.apache.spark.examples.SparkPi
11+
mainApplicationFile: /stackable/spark/examples/jars/spark-examples_2.12-3.2.1.jar # <2>
12+
driver:
13+
cores: 1
14+
coreLimit: "1200m"
15+
memory: "512m"
16+
executor:
17+
cores: 1
18+
instances: 3
19+
memory: "512m"
Lines changed: 43 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,43 @@
1+
---
2+
apiVersion: v1
3+
kind: PersistentVolume
4+
metadata:
5+
name: pv-ksv # <1>
6+
spec:
7+
storageClassName: standard
8+
accessModes:
9+
- ReadWriteOnce
10+
capacity:
11+
storage: 2Gi
12+
hostPath:
13+
path: /some-host-location
14+
---
15+
apiVersion: v1
16+
kind: PersistentVolumeClaim
17+
metadata:
18+
name: pvc-ksv # <2>
19+
spec:
20+
volumeName: pv-ksv # <1>
21+
accessModes:
22+
- ReadWriteOnce
23+
resources:
24+
requests:
25+
storage: 1Gi
26+
---
27+
apiVersion: batch/v1
28+
kind: Job
29+
metadata:
30+
name: aws-deps
31+
spec:
32+
template:
33+
spec:
34+
restartPolicy: Never
35+
volumes:
36+
- name: job-deps # <3>
37+
persistentVolumeClaim:
38+
claimName: pvc-ksv # <2>
39+
containers:
40+
- name: aws-deps
41+
volumeMounts:
42+
- name: job-deps # <4>
43+
mountPath: /stackable/spark/dependencies
Lines changed: 38 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,38 @@
1+
---
2+
apiVersion: spark.stackable.tech/v1alpha1
3+
kind: SparkApplication
4+
metadata:
5+
name: example-sparkapp-external-dependencies
6+
namespace: default
7+
spec:
8+
version: "1.0"
9+
sparkImage: docker.stackable.tech/stackable/pyspark-k8s:3.2.1-hadoop3.2-python39-stackable0.1.0
10+
mode: cluster
11+
mainApplicationFile: s3a://stackable-spark-k8s-jars/jobs/ny_tlc_report.py # <1>
12+
args:
13+
- "--input 's3a://nyc-tlc/trip data/yellow_tripdata_2021-07.csv'" # <2>
14+
deps:
15+
requirements:
16+
- tabulate==0.8.9 # <3>
17+
sparkConf: # <4>
18+
"spark.hadoop.fs.s3a.aws.credentials.provider": "org.apache.hadoop.fs.s3a.AnonymousAWSCredentialsProvider"
19+
"spark.driver.extraClassPath": "/dependencies/jars/hadoop-aws-3.2.0.jar:/dependencies/jars/aws-java-sdk-bundle-1.11.375.jar"
20+
"spark.executor.extraClassPath": "/dependencies/jars/hadoop-aws-3.2.0.jar:/dependencies/jars/aws-java-sdk-bundle-1.11.375.jar"
21+
volumes:
22+
- name: job-deps # <5>
23+
persistentVolumeClaim:
24+
claimName: pvc-ksv
25+
driver:
26+
cores: 1
27+
coreLimit: "1200m"
28+
memory: "512m"
29+
volumeMounts:
30+
- name: job-deps
31+
mountPath: /dependencies # <6>
32+
executor:
33+
cores: 1
34+
instances: 3
35+
memory: "512m"
36+
volumeMounts:
37+
- name: job-deps
38+
mountPath: /dependencies # <6>
Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,39 @@
1+
---
2+
apiVersion: spark.stackable.tech/v1alpha1
3+
kind: SparkApplication
4+
metadata:
5+
name: example-sparkapp-image
6+
namespace: default
7+
spec:
8+
version: "1.0"
9+
image: docker.stackable.tech/stackable/ny-tlc-report:0.1.0 # <1>
10+
sparkImage: docker.stackable.tech/stackable/pyspark-k8s:3.2.1-hadoop3.2-python39-stackable0.1.0
11+
mode: cluster
12+
mainApplicationFile: local:///stackable/spark/jobs/ny_tlc_report.py # <2>
13+
args:
14+
- "--input 's3a://nyc-tlc/trip data/yellow_tripdata_2021-07.csv'" # <3>
15+
deps:
16+
requirements:
17+
- tabulate==0.8.9 # <4>
18+
sparkConf: # <5>
19+
"spark.hadoop.fs.s3a.aws.credentials.provider": "org.apache.hadoop.fs.s3a.AnonymousAWSCredentialsProvider"
20+
"spark.driver.extraClassPath": "/dependencies/jars/hadoop-aws-3.2.0.jar:/dependencies/jars/aws-java-sdk-bundle-1.11.375.jar"
21+
"spark.executor.extraClassPath": "/dependencies/jars/hadoop-aws-3.2.0.jar:/dependencies/jars/aws-java-sdk-bundle-1.11.375.jar"
22+
volumes:
23+
- name: job-deps # <6>
24+
persistentVolumeClaim:
25+
claimName: pvc-ksv
26+
driver:
27+
cores: 1
28+
coreLimit: "1200m"
29+
memory: "512m"
30+
volumeMounts:
31+
- name: job-deps
32+
mountPath: /dependencies # <7>
33+
executor:
34+
cores: 1
35+
instances: 3
36+
memory: "512m"
37+
volumeMounts:
38+
- name: job-deps
39+
mountPath: /dependencies # <7>
Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,36 @@
1+
---
2+
apiVersion: spark.stackable.tech/v1alpha1
3+
kind: SparkApplication
4+
metadata:
5+
name: example-sparkapp-pvc
6+
namespace: default
7+
spec:
8+
version: "1.0"
9+
sparkImage: docker.stackable.tech/stackable/spark-k8s:3.2.1-hadoop3.2-stackable0.4.0
10+
mode: cluster
11+
mainApplicationFile: s3a://stackable-spark-k8s-jars/jobs/ny-tlc-report-1.0-SNAPSHOT.jar # <1>
12+
mainClass: org.example.App # <2>
13+
args:
14+
- "'s3a://nyc-tlc/trip data/yellow_tripdata_2021-07.csv'"
15+
sparkConf: # <3>
16+
"spark.hadoop.fs.s3a.aws.credentials.provider": "org.apache.hadoop.fs.s3a.AnonymousAWSCredentialsProvider"
17+
"spark.driver.extraClassPath": "/dependencies/jars/hadoop-aws-3.2.0.jar:/dependencies/jars/aws-java-sdk-bundle-1.11.375.jar"
18+
"spark.executor.extraClassPath": "/dependencies/jars/hadoop-aws-3.2.0.jar:/dependencies/jars/aws-java-sdk-bundle-1.11.375.jar"
19+
volumes:
20+
- name: job-deps # <4>
21+
persistentVolumeClaim:
22+
claimName: pvc-ksv
23+
driver:
24+
cores: 1
25+
coreLimit: "1200m"
26+
memory: "512m"
27+
volumeMounts:
28+
- name: job-deps
29+
mountPath: /dependencies # <5>
30+
executor:
31+
cores: 1
32+
instances: 3
33+
memory: "512m"
34+
volumeMounts:
35+
- name: job-deps
36+
mountPath: /dependencies # <5>
Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,37 @@
1+
---
2+
apiVersion: spark.stackable.tech/v1alpha1
3+
kind: SparkApplication
4+
metadata:
5+
name: example-sparkapp-s3-private
6+
spec:
7+
version: "1.0"
8+
sparkImage: docker.stackable.tech/stackable/spark-k8s:3.2.1-hadoop3.2-stackable0.4.0
9+
mode: cluster
10+
mainApplicationFile: s3a://my-bucket/spark-examples_2.12-3.2.1.jar # <1>
11+
mainClass: org.apache.spark.examples.SparkPi # <2>
12+
s3: # <3>
13+
credentialsSecret: minio-credentials # <4>
14+
endpoint: http://test-minio:9000/
15+
sparkConf: # <5>
16+
spark.hadoop.fs.s3a.aws.credentials.provider: "org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider" # <6>
17+
spark.hadoop.fs.s3a.path.style.access: "true"
18+
spark.driver.extraClassPath: "/dependencies/jars/hadoop-aws-3.2.0.jar:/dependencies/jars/aws-java-sdk-bundle-1.11.375.jar"
19+
spark.executor.extraClassPath: "/dependencies/jars/hadoop-aws-3.2.0.jar:/dependencies/jars/aws-java-sdk-bundle-1.11.375.jar"
20+
volumes:
21+
- name: spark-pi-deps # <7>
22+
persistentVolumeClaim:
23+
claimName: spark-pi-pvc
24+
driver:
25+
cores: 1
26+
coreLimit: "1200m"
27+
memory: "512m"
28+
volumeMounts:
29+
- name: spark-pi-deps
30+
mountPath: /dependencies # <8>
31+
executor:
32+
cores: 1
33+
instances: 3
34+
memory: "512m"
35+
volumeMounts:
36+
- name: spark-pi-deps
37+
mountPath: /dependencies # <8>
206 KB
Loading

docs/modules/ROOT/nav.adoc

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,4 @@
11
* xref:installation.adoc[]
22
* xref:usage.adoc[]
3+
* xref:job_dependencies.adoc[]
4+
* xref:rbac.adoc[]
Lines changed: 19 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,28 @@
11

22
=== product-config
33

4-
*Default value*: `/etc/stackable/spark-operator/config-spec/properties.yaml`
4+
*Default value*: `/etc/stackable/spark-k8s-operator/config-spec/properties.yaml`
55

66
*Required*: false
77

88
*Multiple values:* false
99

10-
This file contains property definitions for the Apache Spark configuration.
10+
[source]
11+
----
12+
stackable-spark-k8s-operator run --product-config /foo/bar/properties.yaml
13+
----
1114

15+
=== watch-namespace
16+
17+
*Default value*: All namespaces
18+
19+
*Required*: false
20+
21+
*Multiple values:* false
22+
23+
The operator will **only** watch for resources in the provided namespace `test`:
24+
25+
[source]
26+
----
27+
stackable-spark-k8s-operator run --watch-namespace test
28+
----
Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
= Configuration
2+
3+
== Command Line Parameters
4+
5+
This operator accepts the following command line parameters:
6+
7+
include::commandline_args.adoc[]
8+
9+
== Environment variables
10+
11+
This operator accepts the following environment variables:
12+
13+
include::env_var_args.adoc[]
Lines changed: 56 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,56 @@
1+
2+
=== PRODUCT_CONFIG
3+
4+
*Default value*: `/etc/stackable/spark-k8s-operator/config-spec/properties.yaml`
5+
6+
*Required*: false
7+
8+
*Multiple values:* false
9+
10+
[source]
11+
----
12+
export PRODUCT_CONFIG=/foo/bar/properties.yaml
13+
stackable-spark-k8s-operator run
14+
----
15+
16+
or via docker:
17+
18+
----
19+
docker run \
20+
--name spark-k8s-operator \
21+
--network host \
22+
--env KUBECONFIG=/home/stackable/.kube/config \
23+
--env PRODUCT_CONFIG=/my/product/config.yaml \
24+
--mount type=bind,source="$HOME/.kube/config",target="/home/stackable/.kube/config" \
25+
docker.stackable.tech/stackable/spark-k8s-operator:latest
26+
----
27+
28+
=== WATCH_NAMESPACE
29+
30+
*Default value*: All namespaces
31+
32+
*Required*: false
33+
34+
*Multiple values:* false
35+
36+
The operator will **only** watch for resources in the provided namespace `test`:
37+
38+
[source]
39+
----
40+
export WATCH_NAMESPACE=test
41+
stackable-spark-k8s-operator run
42+
----
43+
44+
or via docker:
45+
46+
[source]
47+
----
48+
docker run \
49+
--name spark-k8s-operator \
50+
--network host \
51+
--env KUBECONFIG=/home/stackable/.kube/config \
52+
--env WATCH_NAMESPACE=test \
53+
--mount type=bind,source="$HOME/.kube/config",target="/home/stackable/.kube/config" \
54+
docker.stackable.tech/stackable/spark-k8s-operator:latest
55+
----
56+

docs/modules/ROOT/pages/index.adoc

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ WARNING: This operator only works with images from the https://repo.stackable.te
66

77
== Supported Versions
88

9-
The Stackable Operator for Apache Spark currently supports the following versions of Spark:
9+
The Stackable Operator for Apache Spark on Kubernetes currently supports the following versions of Spark:
1010

1111
include::partial$supported-versions.adoc[]
1212

0 commit comments

Comments
 (0)