You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/modules/getting_started/pages/first_steps.adoc
+11-5
Original file line number
Diff line number
Diff line change
@@ -2,13 +2,13 @@
2
2
3
3
Once you have followed the steps in the xref:installation.adoc[] section to install the Operator and its dependencies, you will now create a Spark job. Afterwards you can <<_verify_that_it_works, verify that it works>> by looking at the logs from the driver pod.
4
4
5
-
=== Airflow
5
+
=== Starting a Spark job
6
6
7
-
An Airflow cluster is made of up three components:
7
+
A SparkApplication is made of up three components:
8
8
9
-
- `webserver`: this provides the main UI for user-interaction
10
-
- `workers`: the nodes over which the job workload will be distributed by the scheduler
11
-
- `scheduler`: responsible for triggering jobs and persisting their metadata to the backend database
9
+
- Job: this will build a spark-submit command from the resource, passing this to internal spark code together with templates for building the driver and executor pods
10
+
- Driver: the driver starts the designated number of executors and removes them when the job is completed.
11
+
- Executor(s): responsible for executing the job itself
12
12
13
13
Create a file named `pyspark-pi.yaml` with the following contents:
- `pyspark-pi-xxxxxxx-driver`: the driver pod that drives the execution
51
51
- `pythonpi-xxxxxxxxx-exec-x`: the set of executors started by the driver (in our example `spec.executor.instances` was set to 3 which is why we have 3 executors)
52
52
53
+
Job progress can be followed by issuing this command:
When the job completes the driver cleans up the executor. The initial job is persisted for several minutes before being removed. The completed state will look like this:
0 commit comments