stackabletech
diff --git a/‎docs/modules/getting_started/images/spark_complete.png
99.6 KB b/‎docs/modules/getting_started/images/spark_complete.png
99.6 KB
diff --git a/‎docs/modules/getting_started/images/spark_log.png
709 KB b/‎docs/modules/getting_started/images/spark_log.png
709 KB
diff --git a/‎docs/modules/getting_started/images/spark_running.png
141 KB b/‎docs/modules/getting_started/images/spark_running.png
141 KB
diff --git a/‎docs/modules/getting_started/nav.adoc
+1-1 b/‎docs/modules/getting_started/nav.adoc
+1-1
diff --git a/‎docs/modules/getting_started/pages/first_steps.adoc
+19-1 b/‎docs/modules/getting_started/pages/first_steps.adoc
+19-1
@@ -1,3 +1,3 @@
-** xref:index.adoc[]
+* xref:index.adoc[]
 ** xref:installation.adoc[]
 ** xref:first_steps.adoc[]
@@ -29,7 +29,7 @@ Where:
 - `spec.version`: the current version is "1.0"
 - `spec.sparkImage`: the docker image that will be used by job, driver and executor pods. This can be provided by the user.
 - `spec.mode`: only `cluster` is currently supported
-- `spec.mainApplicationFile`: the artifact (Java, Scala or Python) that forms the basis of the Spark job.
+- `spec.mainApplicationFile`: the artifact (Java, Scala or Python) that forms the basis of the Spark job. This path is relative to the image, so in this case we are running an example python script (that calculates the value of pi): it is bundled with the Spark code and therefore already present in the job image
 - `spec.driver`: driver-specific settings.
 - `spec.executor`: executor-specific settings.
 
@@ -39,3 +39,21 @@ https://repo.stackable.tech/#browse/browse:docker:v2%2Fstackable%spark-k8s%2Ftag
 It should generally be safe to simply use the latest image version that is available.
 
 This will create the SparkApplication that in turn creates the Spark job.
+
+== Verify that it works
+
+As mentioned above, the SparkApplication that has just been created will build a spark-submit command and pass it to the driver pod, which in turn will create executor pods that run for the duration of the job before being clean up. A running process will look like this:
+
+image::spark_running.png[Spark job]
+
+- `pyspark-pi-xxxx`: this is the initialising job that creates the spark-submit command (named as `metadata.name` with a unique suffix)
+- `pyspark-pi-xxxxxxx-driver`: the driver pod that drives the execution
+- `pythonpi-xxxxxxxxx-exec-x`: the set of executors started by the driver (in our example `spec.executor.instances` was set to 3 which is why we have 3 executors)
+
+When the job completes the driver cleans up the executor. The initial job is persisted for several minutes before being removed. The completed state will look like this:
+
+image::spark_complete.png[Completed job]
+
+The driver logs can be inspected for more information about the results of the job. In this case we expect to find the results of our (approximate!) pi calculation:
+
+image::spark_log.png[Driver log]