Skip to content

Added test for PySpark application published as a Docker image. #107

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Aug 2, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,11 +10,13 @@ All notable changes to this project will be documented in this file.
- Pinned MinIO version for tests ([#100])
- `operator-rs` `0.21.0` → `0.22.0` ([#102]).
- Added owner-reference to pod templates ([#104])
- Added kuttl test for the case when pyspark jobs are provisioned using the `image` property of the `SparkApplication` definition ([#107])

[#97]: https://github.com/stackabletech/spark-k8s-operator/pull/92
[#100]: https://github.com/stackabletech/spark-k8s-operator/pull/100
[#102]: https://github.com/stackabletech/spark-k8s-operator/pull/102
[#104]: https://github.com/stackabletech/spark-k8s-operator/pull/104
[#107]: https://github.com/stackabletech/spark-k8s-operator/pull/107

## [0.3.0] - 2022-06-30

Expand Down
21 changes: 21 additions & 0 deletions tests/templates/kuttl/pyspark-ny-public-s3-image/00-assert.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
---
apiVersion: kuttl.dev/v1beta1
kind: TestAssert
metadata:
name: minio
timeout: 900
---
apiVersion: v1
kind: Service
metadata:
name: test-minio
labels:
app: minio
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: minio-mc
status:
readyReplicas: 1
replicas: 1
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
---
apiVersion: v1
kind: Service
metadata:
name: minio-mc
labels:
app: minio-mc
timeout: 240
spec:
clusterIP: None
selector:
app: minio-mc
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: minio-mc
labels:
app: minio-mc
timeout: 240
spec:
replicas: 1
serviceName: "minio-mc"
selector:
matchLabels:
app: minio-mc
template:
metadata:
labels:
app: minio-mc
spec:
containers:
- name: minio-mc
image: bitnami/minio:2022-debian-10
stdin: true
tty: true
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
---
apiVersion: kuttl.dev/v1beta1
kind: TestStep
commands:
- script: >-
helm install test-minio
--namespace $NAMESPACE
--version 4.0.2
--set mode=standalone
--set replicas=1
--set persistence.enabled=false
--set buckets[0].name=my-bucket,buckets[0].policy=public
--set resources.requests.memory=1Gi
--repo https://charts.min.io/ minio
timeout: 240
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
---
apiVersion: kuttl.dev/v1beta1
kind: TestStep
commands:
- script: >-
kubectl exec -n $NAMESPACE minio-mc-0 --
sh -c 'mc alias set test-minio http://test-minio:9000/'
- script: kubectl cp -n $NAMESPACE yellow_tripdata_2021-07.csv minio-mc-0:/tmp
- script: >-
kubectl exec -n $NAMESPACE minio-mc-0 --
mc cp /tmp/yellow_tripdata_2021-07.csv test-minio/my-bucket
13 changes: 13 additions & 0 deletions tests/templates/kuttl/pyspark-ny-public-s3-image/02-assert.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
---
apiVersion: kuttl.dev/v1beta1
kind: TestAssert
metadata:
name: pyspark-ny-deps-job
timeout: 900
---
apiVersion: batch/v1
kind: Job
metadata:
name: pyspark-ny-deps-job
status:
succeeded: 1
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: pyspark-ny-pvc
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1Gi
---
apiVersion: batch/v1
kind: Job
metadata:
name: pyspark-ny-deps-job
spec:
template:
spec:
nodeSelector:
node: "1"
restartPolicy: Never
volumes:
- name: job-deps
persistentVolumeClaim:
claimName: pyspark-ny-pvc
containers:
- name: aws-deps
image: docker.stackable.tech/stackable/tools:0.2.0-stackable0
env:
- name: DEST_DIR
value: "/dependencies/jars"
- name: AWS
value: "1.11.1026"
- name: HADOOP
value: "3.3.3"
command:
[
"bash",
"-x",
"-o",
"pipefail",
"-c",
"mkdir -p ${DEST_DIR} && curl -L https://search.maven.org/remotecontent?filepath=org/apache/hadoop/hadoop-aws/${HADOOP}/hadoop-aws-${HADOOP}.jar -o ${DEST_DIR}/hadoop-aws-${HADOOP}.jar && curl -L https://repo1.maven.org/maven2/com/amazonaws/aws-java-sdk-bundle/${AWS}/aws-java-sdk-bundle-${AWS}.jar -o ${DEST_DIR}/aws-java-sdk-bundle-${AWS}.jar && chown -R stackable:stackable ${DEST_DIR} && chmod -R a=,u=rwX ${DEST_DIR}",
]
volumeMounts:
- name: job-deps
mountPath: /dependencies
securityContext:
runAsUser: 0
14 changes: 14 additions & 0 deletions tests/templates/kuttl/pyspark-ny-public-s3-image/10-assert.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
---
apiVersion: kuttl.dev/v1beta1
kind: TestAssert
metadata:
name: pyspark-ny-public-s3-image
timeout: 900
---
# The Job starting the whole process
apiVersion: spark.stackable.tech/v1alpha1
kind: SparkApplication
metadata:
name: pyspark-ny-public-s3-image
status:
phase: Succeeded
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
---
apiVersion: spark.stackable.tech/v1alpha1
kind: SparkApplication
metadata:
name: pyspark-ny-public-s3-image
spec:
version: "1.0"
# everything under /jobs will be copied to /stackable/spark/jobs
image: docker.stackable.tech/stackable/ny-tlc-report:{{ test_scenario['values']['ny-tlc-report'] }}
sparkImage: docker.stackable.tech/stackable/pyspark-k8s:{{ test_scenario['values']['spark'] }}-stackable{{ test_scenario['values']['stackable'] }}
sparkImagePullPolicy: IfNotPresent
mode: cluster
mainApplicationFile: local:///stackable/spark/jobs/ny_tlc_report.py
args:
- "--input 's3a://my-bucket/yellow_tripdata_2021-07.csv'"
deps:
requirements:
- tabulate==0.8.9
s3bucket:
inline:
bucketName: my-bucket
connection:
inline:
host: test-minio
port: 9000
accessStyle: Path
sparkConf:
spark.hadoop.fs.s3a.aws.credentials.provider: "org.apache.hadoop.fs.s3a.AnonymousAWSCredentialsProvider"
spark.driver.extraClassPath: "/dependencies/jars/*"
spark.executor.extraClassPath: "/dependencies/jars/*"
volumes:
- name: job-deps
persistentVolumeClaim:
claimName: pyspark-ny-pvc
driver:
cores: 1
coreLimit: "1200m"
memory: "512m"
volumeMounts:
- name: job-deps
mountPath: /dependencies
executor:
cores: 1
instances: 3
memory: "512m"
volumeMounts:
- name: job-deps
mountPath: /dependencies
Loading