Skip to content

Commit 45c1e7a

Browse files
feat: custom log directory (#479)
* Use a generic ResolvedLogDir instead of the concrete S3LogDir * Add a test for customLogDirectory * Document the property customLogDirectory * Use "log directory" instead of "S3 log directory" when the resolved log directory structure is used * Update CRD docs * Update docs/modules/spark-k8s/pages/index.adoc Co-authored-by: Razvan-Daniel Mihai <[email protected]> * Update CHANGELOG.md Co-authored-by: Razvan-Daniel Mihai <[email protected]> --------- Co-authored-by: Razvan-Daniel Mihai <[email protected]>
1 parent 2d68f63 commit 45c1e7a

27 files changed

+619
-124
lines changed

CHANGELOG.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,7 @@ All notable changes to this project will be documented in this file.
77
### Added
88

99
- Make spark-env.sh configurable via `configOverrides` ([#473]).
10+
- The Spark history server can now service logs from HDFS compatible systems ([#479]).
1011

1112
### Changed
1213

@@ -33,6 +34,7 @@ All notable changes to this project will be documented in this file.
3334
[#460]: https://github.com/stackabletech/spark-k8s-operator/pull/460
3435
[#472]: https://github.com/stackabletech/spark-k8s-operator/pull/472
3536
[#473]: https://github.com/stackabletech/spark-k8s-operator/pull/473
37+
[#479]: https://github.com/stackabletech/spark-k8s-operator/pull/479
3638

3739
## [24.7.0] - 2024-07-24
3840

deploy/helm/spark-k8s-operator/crds/crds.yaml

Lines changed: 14 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -615,13 +615,19 @@ spec:
615615
x-kubernetes-preserve-unknown-fields: true
616616
type: object
617617
logFileDirectory:
618-
description: The log file directory definition used by the Spark history server. Currently only S3 buckets are supported.
618+
description: The log file directory definition used by the Spark history server.
619619
nullable: true
620620
oneOf:
621621
- required:
622622
- s3
623+
- required:
624+
- customLogDirectory
623625
properties:
626+
customLogDirectory:
627+
description: A custom log directory
628+
type: string
624629
s3:
630+
description: An S3 bucket storing the log events
625631
properties:
626632
bucket:
627633
oneOf:
@@ -1065,12 +1071,18 @@ spec:
10651071
type: string
10661072
type: object
10671073
logFileDirectory:
1068-
description: The log file directory definition used by the Spark history server. Currently only S3 buckets are supported.
1074+
description: The log file directory definition used by the Spark history server.
10691075
oneOf:
10701076
- required:
10711077
- s3
1078+
- required:
1079+
- customLogDirectory
10721080
properties:
1081+
customLogDirectory:
1082+
description: A custom log directory
1083+
type: string
10731084
s3:
1085+
description: An S3 bucket storing the log events
10741086
properties:
10751087
bucket:
10761088
oneOf:

docs/modules/spark-k8s/pages/index.adoc

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -37,8 +37,8 @@ The SparkApplication resource is the main point of interaction with the operator
3737
An exhaustive list of options is given in the {crd}[SparkApplication CRD reference {external-link-icon}^].
3838

3939
The xref:usage-guide/history-server.adoc[SparkHistoryServer] has a single `node` role.
40-
It is used to deploy a https://spark.apache.org/docs/latest/monitoring.html#viewing-after-the-fact[Spark history server] that displays application logs from S3 buckets.
41-
Of course, your applications need to write their logs to the same buckets.
40+
It is used to deploy a https://spark.apache.org/docs/latest/monitoring.html#viewing-after-the-fact[Spark history server] that displays application logs.
41+
Of course, your applications need to write their logs to the same location.
4242

4343
=== Kubernetes resources
4444

docs/modules/spark-k8s/pages/usage-guide/history-server.adoc

Lines changed: 85 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -17,9 +17,7 @@ For more details on how the Stackable Data Platform manages S3 resources see the
1717
include::example$example-history-server.yaml[]
1818
----
1919

20-
<1> The location of the event logs.
21-
Must be an S3 bucket.
22-
Future implementations might add support for other shared filesystems such as HDFS.
20+
<1> The location of the event logs, see <<log-dir-variants>> for other options.
2321
<2> Directory within the S3 bucket where the log files are located.
2422
This directory is required and must exist before setting up the history server.
2523
<3> The S3 bucket definition, here provided in-line.
@@ -56,7 +54,91 @@ include::example$example-history-app.yaml[]
5654
<5> Bucket to store logs. This must match the bucket used by the history server.
5755
<6> Credentials used to write event logs. These can, of course, differ from the credentials used to process data.
5856

57+
[#log-dir-variants]
58+
== Supported file systems for storing log events
5959

60+
=== S3
61+
62+
As already shown in the example above, the event logs can be stored in an S3 bucket:
63+
64+
[source,yaml]
65+
----
66+
---
67+
apiVersion: spark.stackable.tech/v1alpha1
68+
kind: SparkHistoryServer
69+
spec:
70+
logFileDirectory:
71+
s3:
72+
prefix: eventlogs/
73+
bucket:
74+
...
75+
---
76+
apiVersion: spark.stackable.tech/v1alpha1
77+
kind: SparkApplication
78+
spec:
79+
logFileDirectory:
80+
s3:
81+
prefix: eventlogs/
82+
bucket:
83+
...
84+
----
85+
86+
=== Custom log directory
87+
88+
If there is no structure provided for the desired file system, it can nevertheless be set with the property `customLogDirectory`.
89+
Additional configuration overrides may be necessary in this case.
90+
91+
For instance, to store the Spark event logs in HDFS, the following configuration could be used:
92+
93+
[source,yaml]
94+
----
95+
---
96+
apiVersion: spark.stackable.tech/v1alpha1
97+
kind: SparkHistoryServer
98+
spec:
99+
logFileDirectory:
100+
customLogDirectory: hdfs://simple-hdfs/eventlogs/ # <1>
101+
nodes:
102+
envOverrides:
103+
HADOOP_CONF_DIR: /stackable/hdfs-config # <2>
104+
podOverrides:
105+
spec:
106+
containers:
107+
- name: spark-history
108+
volumeMounts:
109+
- name: hdfs-config
110+
mountPath: /stackable/hdfs-config
111+
volumes:
112+
- name: hdfs-config
113+
configMap:
114+
name: hdfs # <3>
115+
---
116+
apiVersion: spark.stackable.tech/v1alpha1
117+
kind: SparkApplication
118+
spec:
119+
logFileDirectory:
120+
customLogDirectory: hdfs://simple-hdfs/eventlogs/ # <4>
121+
sparkConf:
122+
spark.driver.extraClassPath: /stackable/hdfs-config # <5>
123+
driver:
124+
config:
125+
volumeMounts:
126+
- name: hdfs-config
127+
mountPath: /stackable/hdfs-config
128+
volumes:
129+
- name: hdfs-config
130+
configMap:
131+
name: hdfs
132+
----
133+
134+
<1> A custom log directory that is used for the Spark option `spark.history.fs.logDirectory`.
135+
The required dependencies must be on the class path.
136+
This is the case for HDFS.
137+
<2> The Spark History Server looks for the Hadoop configuration in the directory defined by the environment variable `HADOOP_CONF_DIR`.
138+
<3> The ConfigMap containing the Hadoop configuration files `core-site.xml` and `hdfs-site.xml`.
139+
<4> A custom log directory that is used for the Spark option `spark.eventLog.dir`.
140+
Additionally, the Spark option `spark.eventLog.enabled` is set to `true`.
141+
<5> The Spark driver looks for the Hadoop configuration on the class path.
60142

61143
== History Web UI
62144

rust/crd/src/history.rs

Lines changed: 11 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,4 @@
1-
use crate::s3logdir::S3LogDir;
2-
use crate::tlscerts;
1+
use crate::logdir::ResolvedLogDir;
32
use crate::{affinity::history_affinity, constants::*};
43

54
use product_config::{types::PropertyNameKind, ProductConfigManager};
@@ -78,7 +77,6 @@ pub struct SparkHistoryServerSpec {
7877
pub vector_aggregator_config_map_name: Option<String>,
7978

8079
/// The log file directory definition used by the Spark history server.
81-
/// Currently only S3 buckets are supported.
8280
pub log_file_directory: LogFileDirectorySpec,
8381

8482
/// A map of key/value strings that will be passed directly to Spark when deploying the history server.
@@ -235,7 +233,7 @@ impl SparkHistoryServer {
235233

236234
pub fn merged_env(
237235
&self,
238-
s3logdir: &S3LogDir,
236+
logdir: &ResolvedLogDir,
239237
role_group_env_overrides: HashMap<String, String>,
240238
) -> Vec<EnvVar> {
241239
// Maps env var name to env var object. This allows env_overrides to work
@@ -271,7 +269,7 @@ impl SparkHistoryServer {
271269
];
272270

273271
// if TLS is enabled build truststore
274-
if tlscerts::tls_secret_name(&s3logdir.bucket.connection).is_some() {
272+
if logdir.tls_enabled() {
275273
history_opts.extend(vec![
276274
format!("-Djavax.net.ssl.trustStore={STACKABLE_TRUST_STORE}/truststore.p12"),
277275
format!("-Djavax.net.ssl.trustStorePassword={STACKABLE_TLS_STORE_PASSWORD}"),
@@ -327,8 +325,11 @@ impl SparkHistoryServer {
327325
#[derive(Clone, Debug, Deserialize, JsonSchema, Serialize, Display)]
328326
#[serde(rename_all = "camelCase")]
329327
pub enum LogFileDirectorySpec {
328+
/// An S3 bucket storing the log events
330329
#[strum(serialize = "s3")]
331330
S3(S3LogFileDirectorySpec),
331+
/// A custom log directory
332+
CustomLogDirectory(String),
332333
}
333334

334335
#[derive(Clone, Debug, Deserialize, JsonSchema, Serialize)]
@@ -456,6 +457,8 @@ impl Configuration for HistoryConfigFragment {
456457

457458
#[cfg(test)]
458459
mod test {
460+
use crate::logdir::S3LogDir;
461+
459462
use super::*;
460463
use indoc::indoc;
461464
use stackable_operator::commons::{
@@ -495,7 +498,7 @@ mod test {
495498
let history: SparkHistoryServer =
496499
serde_yaml::with::singleton_map_recursive::deserialize(deserializer).unwrap();
497500

498-
let s3_log_dir: S3LogDir = S3LogDir {
501+
let log_dir = ResolvedLogDir::S3(S3LogDir {
499502
bucket: ResolvedS3Bucket {
500503
bucket_name: "my-bucket".to_string(),
501504
connection: ResolvedS3Connection {
@@ -507,10 +510,10 @@ mod test {
507510
},
508511
},
509512
prefix: "prefix".to_string(),
510-
};
513+
});
511514

512515
let merged_env = history.merged_env(
513-
&s3_log_dir,
516+
&log_dir,
514517
history
515518
.spec
516519
.nodes

0 commit comments

Comments
 (0)