Skip to content

Commit 4813cfa

Browse files
adwk67sbernauer
andauthored
docs: Document resource parsing with examples (#297)
* document resource parsing with examples * updated changelog * minor changes * more minor changes * removed unused spark conf argument * write cpu/min to cpu/request and exclude redundant conf setting * updated docs to reflect last changes * Update docs/modules/spark-k8s/pages/usage-guide/resources.adoc Co-authored-by: Sebastian Bernauer <[email protected]> * typo * corrected text re. cores settings --------- Co-authored-by: Sebastian Bernauer <[email protected]>
1 parent a3b8580 commit 4813cfa

File tree

5 files changed

+171
-70
lines changed

5 files changed

+171
-70
lines changed

CHANGELOG.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,7 @@ All notable changes to this project will be documented in this file.
2020
- [BREAKING] use product image selection instead of version ([#275]).
2121
- [BREAKING] refactored application roles to use `CommonConfiguration` structures from the operator framework ([#277]).
2222
- Let secret-operator handle certificate conversion ([#286]).
23+
- Extended resource-usage documentation ([#297]).
2324

2425
### Fixed
2526

@@ -39,6 +40,7 @@ All notable changes to this project will be documented in this file.
3940
[#286]: https://github.com/stackabletech/spark-k8s-operator/pull/286
4041
[#288]: https://github.com/stackabletech/spark-k8s-operator/pull/288
4142
[#291]: https://github.com/stackabletech/spark-k8s-operator/pull/291
43+
[#297]: https://github.com/stackabletech/spark-k8s-operator/pull/297
4244

4345
## [23.7.0] - 2023-07-14
4446

docs/modules/spark-k8s/pages/usage-guide/resources.adoc

Lines changed: 148 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
include::home:concepts:stackable_resource_requests.adoc[]
44

5-
If no resources are configured explicitly, the operator uses the following defaults for `SparkApplication`s:
5+
If no resources are configured explicitly, the operator uses the following defaults for `SparkApplication` resources:
66

77
[source,yaml]
88
----
@@ -29,7 +29,7 @@ executor:
2929
min: '250m'
3030
max: "1"
3131
memory:
32-
limit: '4Gi'
32+
limit: '1Gi'
3333
----
3434

3535
For `SparkHistoryServer`s the following defaults are used:
@@ -50,4 +50,149 @@ For more details regarding Kubernetes CPU limits see: https://kubernetes.io/docs
5050

5151
Spark allocates a default amount of non-heap memory based on the type of job (JVM or non-JVM). This is taken into account when defining memory settings based exclusively on the resource limits, so that the "declared" value is the actual total value (i.e. including memory overhead). This may result in minor deviations from the stated resource value due to rounding differences.
5252

53-
NOTE: It is possible to define Spark resources either directly by setting configuration properties listed under `sparkConf`, or by using resource limits. If both are used, then `sparkConf` properties take precedence. It is recommended for the sake of clarity to use *_either_* one *_or_* the other.
53+
NOTE: It is possible to define Spark resources either directly by setting configuration properties listed under `sparkConf`, or by using resource limits. If both are used, then `sparkConf` properties take precedence. It is recommended for the sake of clarity to use *_either_* one *_or_* the other. See below for examples.
54+
55+
== Resource examples
56+
57+
To illustrate resource configuration consider the use-case where resources are defined using CRD fields (which are then parsed internally to be passed to Spark as spark.conf settings).
58+
59+
=== CPU
60+
61+
CPU request and limit will be rounded up to the next integer value, resulting in the following:
62+
63+
64+
|===
65+
|CRD |Spark conf
66+
67+
|1800m
68+
|2
69+
70+
|100m
71+
|1
72+
73+
|1.5
74+
|2
75+
76+
|2
77+
|2
78+
|===
79+
80+
Spark allows CPU limits to be set for the driver and executor using standard Spark settings (`spark.{driver|executor}.cores}`) as well as Kubernetes-specific ones (`spark.kubernetes.{driver,executor}.{request|limit}.cores`). Since `spark.kubernetes.{driver,executor}.request.cores` takes precedence over `spark.{driver|executor}.cores}`, `spark.{driver|executor}.cores}` is not specified by the operator when building the spark-submit configuration.
81+
82+
=== Memory
83+
84+
Memory values are not rounded as is the case with CPU. Values for `spark.{driver|executor}.memory}` - this is the amount of memory to use for the driver process (i.e. where SparkContext is initialized) and executor processes respectively - are passed to Spark in such as a way that the overheads added by Spark are already implicitly declared: this overhead will be applied using a factor of 0.1 (JVM jobs) or 0.4 (non-JVM jobs), being not less than 384MB, the minimum overhead applied by Spark. Once the overhead is applied, the effective value is the one defined by the user. This keeps the values transparent: what is defined in the CRD is what is actually provisioned for the process.
85+
86+
An alternative is to do define the spark.conf settings explicitly and then let Spark apply the overheads to those values.
87+
88+
=== Example
89+
90+
A SparkApplication defines the following resources:
91+
92+
[source,yaml]
93+
----
94+
...
95+
job:
96+
config:
97+
resources:
98+
cpu:
99+
min: 250m # <1>
100+
max: 500m # <2>
101+
memory:
102+
limit: 512Mi # <3>
103+
driver:
104+
config:
105+
resources:
106+
cpu:
107+
min: 200m # <4>
108+
max: 1200m # <5>
109+
memory:
110+
limit: 1024Mi # <6>
111+
executor:
112+
config:
113+
resources:
114+
cpu:
115+
min: 250m # <7>
116+
max: 1000m # <8>
117+
memory:
118+
limit: 1024Mi # <9>
119+
...
120+
----
121+
122+
This will result in the following Pod definitions:
123+
124+
For the job:
125+
126+
[source,yaml]
127+
----
128+
spec:
129+
containers:
130+
- name: spark-submit
131+
resources:
132+
limits:
133+
cpu: 500m # <2>
134+
memory: 512Mi # <3>
135+
requests:
136+
cpu: 250m # <1>
137+
memory: 512Mi # <3>
138+
----
139+
140+
For the driver:
141+
142+
[source,yaml]
143+
----
144+
spec:
145+
containers:
146+
- name: spark
147+
resources:
148+
limits:
149+
cpu: "2" # <5>
150+
memory: 1Gi # <6>
151+
requests:
152+
cpu: "1" # <4>
153+
memory: 1Gi # <6>
154+
----
155+
156+
For each executor:
157+
158+
[source,yaml]
159+
----
160+
spec:
161+
containers:
162+
- name: spark
163+
limits:
164+
cpu: "1" # <7>
165+
memory: 1Gi # <9>
166+
requests:
167+
cpu: "1" # <8>
168+
memory: 1Gi # <9>
169+
----
170+
171+
<1> CPU request (unchanged as this is the Job pod)
172+
<2> CPU limit (unchanged as this is the Job pod)
173+
<3> Memory is assigned to both request and limit values
174+
<4> CPU request, rounded up from `200m` to `1`
175+
<5> CPU limit, rounded up from `1200m` to `2`
176+
<6> Memory after reduction and re-addition of Spark overhead (so the declared value matches what is provisioned)
177+
<7> CPU request, rounded up from `250m` to `1`
178+
<8> CPU limit, unchanged after rounding: `1000m` to `1`
179+
<9> Memory after reduction and re-addition of Spark overhead (so the declared value matches what is provisioned)
180+
181+
The spark.conf values derived from the above can be inspected in the job Pod definition:
182+
183+
[source]
184+
----
185+
...
186+
--conf "spark.driver.cores=1"
187+
--conf "spark.driver.memory=640m"
188+
--conf "spark.executor.cores=1"
189+
--conf "spark.executor.memory=640m"
190+
--conf "spark.kubernetes.driver.limit.cores=1"
191+
--conf "spark.kubernetes.driver.request.cores=2"
192+
--conf "spark.kubernetes.executor.limit.cores=1"
193+
--conf "spark.kubernetes.executor.request.cores=1"
194+
--conf "spark.kubernetes.memoryOverheadFactor=0.0"
195+
...
196+
----
197+
198+
These correspond to the resources listed above for the job/driver/executor Pods, with the exception of `spark.{driver|executor}.memory` where indeed the Spark internal overhead of 384MB has been deducted from 1024MB.

rust/crd/src/lib.rs

Lines changed: 20 additions & 64 deletions
Original file line numberDiff line numberDiff line change
@@ -815,18 +815,21 @@ fn resources_to_driver_props(
815815
props: &mut BTreeMap<String, String>,
816816
) -> Result<(), Error> {
817817
if let Resources {
818-
cpu: CpuLimits { max: Some(max), .. },
818+
cpu: CpuLimits {
819+
min: Some(min),
820+
max: Some(max),
821+
},
819822
..
820823
} = &driver_config.resources
821824
{
822-
let cores = cores_from_quantity(max.0.clone())?;
825+
let min_cores = cores_from_quantity(min.0.clone())?;
826+
let max_cores = cores_from_quantity(max.0.clone())?;
823827
// will have default value from resources to apply if nothing set specifically
824-
props.insert("spark.driver.cores".to_string(), cores.clone());
825828
props.insert(
826829
"spark.kubernetes.driver.request.cores".to_string(),
827-
cores.clone(),
830+
min_cores,
828831
);
829-
props.insert("spark.kubernetes.driver.limit.cores".to_string(), cores);
832+
props.insert("spark.kubernetes.driver.limit.cores".to_string(), max_cores);
830833
}
831834

832835
if let Resources {
@@ -838,22 +841,6 @@ fn resources_to_driver_props(
838841
{
839842
let memory = subtract_spark_memory_overhead(for_java, limit)?;
840843
props.insert("spark.driver.memory".to_string(), memory);
841-
842-
let limit_mb = format!(
843-
"{}m",
844-
MemoryQuantity::try_from(limit)
845-
.context(FailedToConvertJavaHeapSnafu {
846-
unit: BinaryMultiple::Mebi.to_java_memory_unit(),
847-
})?
848-
.scale_to(BinaryMultiple::Mebi)
849-
.floor()
850-
.value as u32
851-
);
852-
props.insert(
853-
"spark.kubernetes.driver.request.memory".to_string(),
854-
limit_mb.clone(),
855-
);
856-
props.insert("spark.kubernetes.driver.limit.memory".to_string(), limit_mb);
857844
}
858845

859846
Ok(())
@@ -867,18 +854,24 @@ fn resources_to_executor_props(
867854
props: &mut BTreeMap<String, String>,
868855
) -> Result<(), Error> {
869856
if let Resources {
870-
cpu: CpuLimits { max: Some(max), .. },
857+
cpu: CpuLimits {
858+
min: Some(min),
859+
max: Some(max),
860+
},
871861
..
872862
} = &executor_config.resources
873863
{
874-
let cores = cores_from_quantity(max.0.clone())?;
864+
let min_cores = cores_from_quantity(min.0.clone())?;
865+
let max_cores = cores_from_quantity(max.0.clone())?;
875866
// will have default value from resources to apply if nothing set specifically
876-
props.insert("spark.executor.cores".to_string(), cores.clone());
877867
props.insert(
878868
"spark.kubernetes.executor.request.cores".to_string(),
879-
cores.clone(),
869+
min_cores,
870+
);
871+
props.insert(
872+
"spark.kubernetes.executor.limit.cores".to_string(),
873+
max_cores,
880874
);
881-
props.insert("spark.kubernetes.executor.limit.cores".to_string(), cores);
882875
}
883876

884877
if let Resources {
@@ -890,25 +883,6 @@ fn resources_to_executor_props(
890883
{
891884
let memory = subtract_spark_memory_overhead(for_java, limit)?;
892885
props.insert("spark.executor.memory".to_string(), memory);
893-
894-
let limit_mb = format!(
895-
"{}m",
896-
MemoryQuantity::try_from(limit)
897-
.context(FailedToConvertJavaHeapSnafu {
898-
unit: BinaryMultiple::Mebi.to_java_memory_unit(),
899-
})?
900-
.scale_to(BinaryMultiple::Mebi)
901-
.floor()
902-
.value as u32
903-
);
904-
props.insert(
905-
"spark.kubernetes.executor.request.memory".to_string(),
906-
limit_mb.clone(),
907-
);
908-
props.insert(
909-
"spark.kubernetes.executor.limit.memory".to_string(),
910-
limit_mb,
911-
);
912886
}
913887

914888
Ok(())
@@ -1070,24 +1044,15 @@ mod tests {
10701044
resources_to_driver_props(true, &driver_config, &mut props).expect("blubb");
10711045

10721046
let expected: BTreeMap<String, String> = vec![
1073-
("spark.driver.cores".to_string(), "1".to_string()),
10741047
("spark.driver.memory".to_string(), "128m".to_string()),
10751048
(
10761049
"spark.kubernetes.driver.limit.cores".to_string(),
10771050
"1".to_string(),
10781051
),
1079-
(
1080-
"spark.kubernetes.driver.limit.memory".to_string(),
1081-
"128m".to_string(),
1082-
),
10831052
(
10841053
"spark.kubernetes.driver.request.cores".to_string(),
10851054
"1".to_string(),
10861055
),
1087-
(
1088-
"spark.kubernetes.driver.request.memory".to_string(),
1089-
"128m".to_string(),
1090-
),
10911056
]
10921057
.into_iter()
10931058
.collect();
@@ -1122,19 +1087,10 @@ mod tests {
11221087
resources_to_executor_props(true, &executor_config, &mut props).expect("blubb");
11231088

11241089
let expected: BTreeMap<String, String> = vec![
1125-
("spark.executor.cores".to_string(), "2".to_string()),
11261090
("spark.executor.memory".to_string(), "128m".to_string()), // 128 and not 512 because memory overhead is subtracted
1127-
(
1128-
"spark.kubernetes.executor.limit.memory".to_string(),
1129-
"512m".to_string(),
1130-
),
11311091
(
11321092
"spark.kubernetes.executor.request.cores".to_string(),
1133-
"2".to_string(),
1134-
),
1135-
(
1136-
"spark.kubernetes.executor.request.memory".to_string(),
1137-
"512m".to_string(),
1093+
"1".to_string(),
11381094
),
11391095
(
11401096
"spark.kubernetes.executor.limit.cores".to_string(),

tests/templates/kuttl/resources/10-assert.yaml.j2

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -39,7 +39,7 @@ spec:
3939
cpu: "2"
4040
memory: 1Gi
4141
requests:
42-
cpu: "2"
42+
cpu: "1"
4343
memory: 1Gi
4444
---
4545
apiVersion: v1

tests/templates/kuttl/resources/12-deploy-spark-app.yaml.j2

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -25,12 +25,10 @@ spec:
2525
spark.kubernetes.executor.podNamePrefix: "resources-sparkconf"
2626
spark.kubernetes.driver.request.cores: "1"
2727
spark.kubernetes.driver.limit.cores: "1"
28-
spark.driver.cores: "1"
2928
spark.driver.memory: "1g"
3029
spark.driver.memoryOverheadFactor: "0.4"
3130
spark.kubernetes.executor.request.cores: "1"
3231
spark.kubernetes.executor.limit.cores: "2"
33-
spark.executor.cores: "2"
3432
spark.executor.memory: "2g"
3533
spark.executor.memoryOverheadFactor: "0.4"
3634
spark.executor.instances: "1"

0 commit comments

Comments
 (0)