Skip to content

Commit 38698ad

Browse files
committed
Resources limits (#147)
# Description Define resource limits for job, driver and executor pods. Fixes #128
1 parent e184d22 commit 38698ad

30 files changed

+1077
-201
lines changed

CHANGELOG.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,8 +7,10 @@ All notable changes to this project will be documented in this file.
77
### Changed
88

99
- Bumped image to `3.3.0-stackable0.2.0` in tests and docs ([#145])
10+
- BREAKING: use resource limit struct instead of passing spark configuration arguments ([#147])
1011

1112
[#145]: https://github.com/stackabletech/spark-k8s-operator/pull/145
13+
[#147]: https://github.com/stackabletech/spark-k8s-operator/pull/147
1214

1315
## [0.5.0] - 2022-09-06
1416

Cargo.lock

Lines changed: 6 additions & 6 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

deploy/crd/sparkapplication.crd.yaml

Lines changed: 223 additions & 19 deletions
Large diffs are not rendered by default.

deploy/helm/spark-k8s-operator/crds/crds.yaml

Lines changed: 223 additions & 19 deletions
Large diffs are not rendered by default.

deploy/manifests/crds.yaml

Lines changed: 223 additions & 19 deletions
Large diffs are not rendered by default.

docs/modules/ROOT/examples/example-encapsulated.yaml

Lines changed: 0 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -9,11 +9,5 @@ spec:
99
mode: cluster
1010
mainClass: org.apache.spark.examples.SparkPi
1111
mainApplicationFile: /stackable/spark/examples/jars/spark-examples_2.12-3.3.0.jar # <2>
12-
driver:
13-
cores: 1
14-
coreLimit: "1200m"
15-
memory: "512m"
1612
executor:
17-
cores: 1
1813
instances: 3
19-
memory: "512m"

docs/modules/ROOT/examples/example-sparkapp-configmap.yaml

Lines changed: 0 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -19,16 +19,11 @@ spec:
1919
sparkConf:
2020
"spark.hadoop.fs.s3a.aws.credentials.provider": "org.apache.hadoop.fs.s3a.AnonymousAWSCredentialsProvider"
2121
driver:
22-
cores: 1
23-
coreLimit: "1200m"
24-
memory: "512m"
2522
volumeMounts:
2623
- name: cm-job-arguments # <6>
2724
mountPath: /arguments # <7>
2825
executor:
29-
cores: 1
3026
instances: 3
31-
memory: "512m"
3227
volumeMounts:
3328
- name: cm-job-arguments # <6>
3429
mountPath: /arguments # <7>

docs/modules/ROOT/examples/example-sparkapp-external-dependencies.yaml

Lines changed: 0 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -23,16 +23,11 @@ spec:
2323
persistentVolumeClaim:
2424
claimName: pvc-ksv
2525
driver:
26-
cores: 1
27-
coreLimit: "1200m"
28-
memory: "512m"
2926
volumeMounts:
3027
- name: job-deps
3128
mountPath: /dependencies # <6>
3229
executor:
33-
cores: 1
3430
instances: 3
35-
memory: "512m"
3631
volumeMounts:
3732
- name: job-deps
3833
mountPath: /dependencies # <6>

docs/modules/ROOT/examples/example-sparkapp-image.yaml

Lines changed: 19 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -17,11 +17,25 @@ spec:
1717
- tabulate==0.8.9 # <4>
1818
sparkConf: # <5>
1919
"spark.hadoop.fs.s3a.aws.credentials.provider": "org.apache.hadoop.fs.s3a.AnonymousAWSCredentialsProvider"
20+
job:
21+
resources:
22+
cpu:
23+
min: "1"
24+
max: "1"
25+
memory:
26+
limit: "1Gi"
2027
driver:
21-
cores: 1
22-
coreLimit: "1200m"
23-
memory: "512m"
28+
resources:
29+
cpu:
30+
min: "1"
31+
max: "1500m"
32+
memory:
33+
limit: "1Gi"
2434
executor:
25-
cores: 1
2635
instances: 3
27-
memory: "512m"
36+
resources:
37+
cpu:
38+
min: "1"
39+
max: "4"
40+
memory:
41+
limit: "2Gi"

docs/modules/ROOT/examples/example-sparkapp-pvc.yaml

Lines changed: 0 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -21,16 +21,11 @@ spec:
2121
persistentVolumeClaim:
2222
claimName: pvc-ksv
2323
driver:
24-
cores: 1
25-
coreLimit: "1200m"
26-
memory: "512m"
2724
volumeMounts:
2825
- name: job-deps
2926
mountPath: /dependencies # <5>
3027
executor:
31-
cores: 1
3228
instances: 3
33-
memory: "512m"
3429
volumeMounts:
3530
- name: job-deps
3631
mountPath: /dependencies # <5>

docs/modules/ROOT/examples/example-sparkapp-s3-private.yaml

Lines changed: 0 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -23,11 +23,5 @@ spec:
2323
spark.hadoop.fs.s3a.aws.credentials.provider: "org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider" # <6>
2424
spark.driver.extraClassPath: "/dependencies/jars/hadoop-aws-3.2.0.jar:/dependencies/jars/aws-java-sdk-bundle-1.11.375.jar"
2525
spark.executor.extraClassPath: "/dependencies/jars/hadoop-aws-3.2.0.jar:/dependencies/jars/aws-java-sdk-bundle-1.11.375.jar"
26-
driver:
27-
cores: 1
28-
coreLimit: "1200m"
29-
memory: "512m"
3026
executor:
31-
cores: 1
3227
instances: 3
33-
memory: "512m"

docs/modules/ROOT/pages/usage.adoc

Lines changed: 42 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -144,6 +144,42 @@ spec:
144144

145145
This has the advantage that bucket configuration can be shared across `SparkApplication`s and reduces the cost of updating these details.
146146

147+
== Resource Requests
148+
149+
// The "nightly" version is needed because the "include" directive searches for
150+
// files in the "stable" version by default.
151+
// TODO: remove the "nightly" version after the next platform release (current: 22.09)
152+
include::nightly@home:concepts:stackable_resource_requests.adoc[]
153+
154+
If no resources are configured explicitly, the operator uses the following defaults:
155+
156+
[source,yaml]
157+
----
158+
job:
159+
resources:
160+
cpu:
161+
min: '50m'
162+
max: "100m"
163+
memory:
164+
limit: '1Gi'
165+
driver:
166+
resources:
167+
cpu:
168+
min: '1'
169+
max: "2"
170+
memory:
171+
limit: '2Gi'
172+
executor:
173+
resources:
174+
cpu:
175+
min: '1'
176+
max: "4"
177+
memory:
178+
limit: '4Gi'
179+
----
180+
WARNING: The default values are _most likely_ not sufficient to run a proper cluster in production. Please adapt according to your requirements.
181+
For more details regarding Kubernetes CPU limits see: https://kubernetes.io/docs/tasks/configure-pod-container/assign-cpu-resource/[Assign CPU Resources to Containers and Pods].
182+
147183
== CRD argument coverage
148184

149185
Below are listed the CRD fields that can be defined by the user:
@@ -214,14 +250,11 @@ Below are listed the CRD fields that can be defined by the user:
214250
|`spec.volumes.persistentVolumeClaim.claimName`
215251
|The persistent volume claim backing the volume
216252

217-
|`spec.driver.cores`
218-
|Number of cores used by the driver (only in cluster mode)
253+
|`spec.job.resources`
254+
|Resources specification for the initiating Job
219255

220-
|`spec.driver.coreLimit`
221-
|Total cores for all executors
222-
223-
|`spec.driver.memory`
224-
|Specified memory for the driver
256+
|`spec.driver.resources`
257+
|Resources specification for the driver Pod
225258

226259
|`spec.driver.volumeMounts`
227260
|A list of mounted volumes for the driver
@@ -235,15 +268,12 @@ Below are listed the CRD fields that can be defined by the user:
235268
|`spec.driver.nodeSelector`
236269
|A dictionary of labels to use for node selection when scheduling the driver N.B. this assumes there are no implicit node dependencies (e.g. `PVC`, `VolumeMount`) defined elsewhere.
237270

238-
|`spec.executor.cores`
239-
|Number of cores for each executor
271+
|`spec.executor.resources`
272+
|Resources specification for the executor Pods
240273

241274
|`spec.executor.instances`
242275
|Number of executor instances launched for this job
243276

244-
|`spec.executor.memory`
245-
|Memory specified for executor
246-
247277
|`spec.executor.volumeMounts`
248278
|A list of mounted volumes for each executor
249279

docs/modules/getting_started/examples/code/getting_started.sh

Lines changed: 0 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -56,14 +56,8 @@ spec:
5656
sparkImage: docker.stackable.tech/stackable/pyspark-k8s:3.3.0-stackable0.2.0
5757
mode: cluster
5858
mainApplicationFile: local:///stackable/spark/examples/src/main/python/pi.py
59-
driver:
60-
cores: 1
61-
coreLimit: "1200m"
62-
memory: "512m"
6359
executor:
64-
cores: 1
6560
instances: 3
66-
memory: "512m"
6761
EOF
6862
# end::install-sparkapp[]
6963

examples/ny-tlc-report-external-dependencies.yaml

Lines changed: 0 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -33,16 +33,11 @@ spec:
3333
persistentVolumeClaim:
3434
claimName: pvc-ksv
3535
driver:
36-
cores: 1
37-
coreLimit: "1200m"
38-
memory: "512m"
3936
volumeMounts:
4037
- name: job-deps
4138
mountPath: /dependencies
4239
executor:
43-
cores: 1
4440
instances: 3
45-
memory: "512m"
4641
volumeMounts:
4742
- name: job-deps
4843
mountPath: /dependencies

examples/ny-tlc-report-image.yaml

Lines changed: 0 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -27,11 +27,5 @@ spec:
2727
accessStyle: Path
2828
sparkConf:
2929
spark.hadoop.fs.s3a.aws.credentials.provider: "org.apache.hadoop.fs.s3a.AnonymousAWSCredentialsProvider"
30-
driver:
31-
cores: 1
32-
coreLimit: "1200m"
33-
memory: "512m"
3430
executor:
35-
cores: 1
3631
instances: 3
37-
memory: "512m"

examples/ny-tlc-report.yaml

Lines changed: 0 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -34,16 +34,11 @@ spec:
3434
sparkConf:
3535
spark.hadoop.fs.s3a.aws.credentials.provider: "org.apache.hadoop.fs.s3a.AnonymousAWSCredentialsProvider"
3636
driver:
37-
cores: 1
38-
coreLimit: "1200m"
39-
memory: "512m"
4037
volumeMounts:
4138
- name: cm-job-arguments
4239
mountPath: /arguments
4340
executor:
44-
cores: 1
4541
instances: 3
46-
memory: "512m"
4742
volumeMounts:
4843
- name: cm-job-arguments
4944
mountPath: /arguments

0 commit comments

Comments
 (0)