Skip to content

Commit 023c8a7

Browse files
authored
remove deleted section (#410)
1 parent 16297d2 commit 023c8a7

File tree

1 file changed

+9
-23
lines changed

1 file changed

+9
-23
lines changed

docs/modules/spark-k8s/pages/usage-guide/examples.adoc

Lines changed: 9 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -2,32 +2,18 @@
22

33
The following examples have the following `spec` fields in common:
44

5-
- `version`: the current version is "1.0"
6-
- `sparkImage`: the docker image that will be used by job, driver and executor pods. This can be provided by the user.
7-
- `mode`: only `cluster` is currently supported
8-
- `mainApplicationFile`: the artifact (Java, Scala or Python) that forms the basis of the Spark job.
9-
- `args`: these are the arguments passed directly to the application. In the examples below it is e.g. the input path for part of the public New York taxi dataset.
10-
- `sparkConf`: these list spark configuration settings that are passed directly to `spark-submit` and which are best defined explicitly by the user. Since the `SparkApplication` "knows" that there is an external dependency (the s3 bucket where the data and/or the application is located) and how that dependency should be treated (i.e. what type of credential checks are required, if any), it is better to have these things declared together.
11-
- `volumes`: refers to any volumes needed by the `SparkApplication`, in this case an underlying `PersistentVolumeClaim`.
12-
- `driver`: driver-specific settings, including any volume mounts.
13-
- `executor`: executor-specific settings, including any volume mounts.
5+
* `version`: the current version is "1.0"
6+
* `sparkImage`: the docker image that will be used by job, driver and executor pods. This can be provided by the user.
7+
* `mode`: only `cluster` is currently supported
8+
* `mainApplicationFile`: the artifact (Java, Scala or Python) that forms the basis of the Spark job.
9+
* `args`: these are the arguments passed directly to the application. In the examples below it is e.g. the input path for part of the public New York taxi dataset.
10+
* `sparkConf`: these list spark configuration settings that are passed directly to `spark-submit` and which are best defined explicitly by the user. Since the `SparkApplication` "knows" that there is an external dependency (the s3 bucket where the data and/or the application is located) and how that dependency should be treated (i.e. what type of credential checks are required, if any), it is better to have these things declared together.
11+
* `volumes`: refers to any volumes needed by the `SparkApplication`, in this case an underlying `PersistentVolumeClaim`.
12+
* `driver`: driver-specific settings, including any volume mounts.
13+
* `executor`: executor-specific settings, including any volume mounts.
1414
1515
Job-specific settings are annotated below.
1616

17-
== Pyspark: externally located artifact and dataset
18-
19-
[source,yaml]
20-
----
21-
include::example$example-sparkapp-external-dependencies.yaml[]
22-
----
23-
24-
<1> Job python artifact (external)
25-
<2> Job argument (external)
26-
<3> List of python job requirements: these will be installed in the pods via `pip`
27-
<4> Spark dependencies: the credentials provider (the user knows what is relevant here) plus dependencies needed to access external resources (in this case, in s3)
28-
<5> the name of the volume mount backed by a `PersistentVolumeClaim` that must be pre-existing
29-
<6> the path on the volume mount: this is referenced in the `sparkConf` section where the extra class path is defined for the driver and executors
30-
3117
== Pyspark: externally located dataset, artifact available via PVC/volume mount
3218

3319
[source,yaml]

0 commit comments

Comments
 (0)