documentation

adwk67 · adwk67 · commit 935d9bf35e37 · 2023-01-13T17:34:57.000+01:00
diff --git a/docs/modules/ROOT/examples/example-history-server.yaml b/docs/modules/ROOT/examples/example-history-server.yaml
@@ -0,0 +1,29 @@
+---
+apiVersion: spark.stackable.tech/v1alpha1
+kind: SparkHistoryServer
+metadata:
+  name: spark-history
+spec:
+  image:
+    productVersion: 3.3.0
+    stackableVersion: 0.3.0
+  logFileDirectory:  # <1>
+    s3:
+      prefix: eventlogs/  # <2>
+      bucket:  # <3>
+        inline:
+          bucketName: spark-logs
+          connection:
+            inline:
+              host: test-minio
+              port: 9000
+              accessStyle: Path
+              credentials:
+                secretClass: s3-credentials-class
+  sparkConf:  # <4>
+  nodes:  # <5>
+    roleGroups:
+      cleaner:
+        replicas: 1
+        config:
+          cleaner: true
diff --git a/docs/modules/ROOT/images/history-server-ui.png b/docs/modules/ROOT/images/history-server-ui.png
diff --git a/docs/modules/ROOT/nav.adoc b/docs/modules/ROOT/nav.adoc
@@ -2,3 +2,4 @@
 * xref:usage.adoc[]
 * xref:job_dependencies.adoc[]
 * xref:rbac.adoc[]
+* xref:history_server.adoc[]
diff --git a/docs/modules/ROOT/pages/history_server.adoc b/docs/modules/ROOT/pages/history_server.adoc
@@ -0,0 +1,25 @@
+= Spark History Server
+
+== Overview
+
+The Stackable Spark-on-Kubernetes operator runs Apache Spark workloads in a Kubernetes cluster, whereby driver- and executor-pods are created for the duration of the job and then terminated. One or more Spark History Server instances can be deployed independently of `SparkApplication` jobs and used as an end-point for spark logging, so that job information can be viewed once the job pods are no longer available.
+
+== Example
+
+[source,yaml]
+----
+include::example$example-history-server.yaml[]
+----
+
+<1> The history server writes logs to a file directory, which currently has to be a bucket in an S3 object store (see the s3 field).
+<2> The log destination requires a prefix so that different bucket folders can be detected correctly.
+<3> The S3BucketDef description, here provided in-line.
+<4> History server configuration settings can be provided here as a map. For possible properties see: https://spark.apache.org/docs/latest/monitoring.html#spark-history-server-configuration-options
+<5> The history server implements a single role called `nodes`.
+
+== Accessing the job history
+
+The history exposes a user console on port 18080. By setting up port-forwarding on 18080 this UI can be opened in a browser to show running and completed jobs:
+
+image::history-server-ui.png[History Server Console]
+