Skip to content

[Merged by Bors] - Spark history server #187

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 61 commits into from
Closed
Show file tree
Hide file tree
Changes from 32 commits
Commits
Show all changes
61 commits
Select commit Hold shift + click to select a range
91f5936
Remove zombie test definition.
razvan Jan 2, 2023
0fc5125
wip : history server
razvan Jan 2, 2023
e5044f4
Consolidate constants.
razvan Jan 3, 2023
057f37c
wip: create deployment, service and config map
razvan Jan 3, 2023
a5abf76
Update changelog
razvan Jan 3, 2023
34c3dd5
Use framework image struct, update crds and more.
razvan Jan 4, 2023
e587200
Use roles and role groups.
razvan Jan 4, 2023
79ea167
Kuttl tests almost green
razvan Jan 5, 2023
8362df5
Populate spark config automatically.
razvan Jan 5, 2023
8f593c2
Use S3 secrets for the logs bucket
razvan Jan 6, 2023
9d2c661
Successfuly started histo server, cleanups, replicas.
razvan Jan 6, 2023
c82335c
collect all three controllers
adwk67 Jan 10, 2023
dfb7d7b
write logs to history server and check results
adwk67 Jan 12, 2023
16ff530
merged main
adwk67 Jan 12, 2023
0027f45
re-format
adwk67 Jan 12, 2023
a9622f7
added resources using fragments
adwk67 Jan 13, 2023
6e26cb7
service account
adwk67 Jan 13, 2023
2895ab4
added sleep in tests for minio
adwk67 Jan 13, 2023
b2765f8
regenerate charts
adwk67 Jan 13, 2023
3c99f0d
use same clusterrole for history server, and add pvc permissions
adwk67 Jan 13, 2023
935d9bf
documentation
adwk67 Jan 13, 2023
21a20c3
added operator-rs update to changelog
adwk67 Jan 13, 2023
6cec93c
parse cleaner config
adwk67 Jan 16, 2023
9b382ba
use history api for test
adwk67 Jan 16, 2023
977eb05
linting
adwk67 Jan 16, 2023
bfbc8dd
Extract cleaner settings into their own function.
razvan Jan 16, 2023
3b04da8
The operator chooses the appropriate s3 credentials provider
razvan Jan 16, 2023
d14788a
Extract S3LogDir to it's own module.
razvan Jan 16, 2023
d9c76c4
Automatically configure event logs for applications that require it.
razvan Jan 16, 2023
ad26663
removed unused dependency
adwk67 Jan 16, 2023
918a2dd
Clean eventual orphaned resources.
razvan Jan 17, 2023
2d20012
Update docs.
razvan Jan 17, 2023
a21bc75
Update docs/modules/ROOT/pages/history_server.adoc
razvan Jan 17, 2023
6af52e6
Update docs/modules/ROOT/examples/example-history-server.yaml
razvan Jan 17, 2023
3070945
Update docs/modules/ROOT/pages/history_server.adoc
razvan Jan 17, 2023
ae9e5d4
Update docs/modules/ROOT/pages/history_server.adoc
razvan Jan 17, 2023
e211669
Update docs/modules/ROOT/pages/history_server.adoc
razvan Jan 17, 2023
213e223
Update docs/modules/ROOT/pages/history_server.adoc
razvan Jan 17, 2023
1650f09
Mount credentials in separate folders and configure bucket specific p…
razvan Jan 17, 2023
b3158b1
Use different endpoints for data and event logs.
razvan Jan 18, 2023
664e6ec
Implement fix for "S3 reference inconsistency #162"
razvan Jan 18, 2023
f308142
main merge
razvan Jan 18, 2023
1fd5ebb
Update docs/modules/ROOT/pages/history_server.adoc
razvan Jan 19, 2023
bd49457
Update docs/modules/ROOT/pages/history_server.adoc
razvan Jan 19, 2023
30479f5
Update docs/modules/ROOT/pages/history_server.adoc
razvan Jan 19, 2023
149bc1f
Update CHANGELOG, docs and clean up
razvan Jan 19, 2023
91bb1df
Remove alternative Dockerfile
razvan Jan 19, 2023
6ab2278
Merge branch 'main' into 124-implement-adr-22-spark-history-server
razvan Jan 19, 2023
685b292
Use references to S3 objects in tests.
razvan Jan 19, 2023
2c843b7
Update rust/operator-binary/src/spark_k8s_controller.rs
razvan Jan 19, 2023
9037986
Update rust/operator-binary/src/history_controller.rs
razvan Jan 19, 2023
3f95292
Update rust/operator-binary/src/history_controller.rs
razvan Jan 19, 2023
e0c7d8c
Update rust/operator-binary/src/history_controller.rs
razvan Jan 19, 2023
2de8ee4
Update rust/crd/src/history.rs
razvan Jan 19, 2023
fe169a1
Update docs/modules/ROOT/pages/usage.adoc
razvan Jan 19, 2023
702b828
Update Rust code with review feedback.
razvan Jan 19, 2023
5bc89dd
Update rust/crd/src/s3logdir.rs
sbernauer Jan 19, 2023
78210d5
Update rust/crd/src/s3logdir.rs
sbernauer Jan 19, 2023
3e5d845
Update rust/operator-binary/src/history_controller.rs
sbernauer Jan 19, 2023
12cb8d9
Update rust/operator-binary/src/history_controller.rs
sbernauer Jan 19, 2023
73aad65
Fix services and watch more objects.
razvan Jan 19, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 7 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,13 @@ All notable changes to this project will be documented in this file.

## [Unreleased]

### Added

- Create and manage history servers ([#187])
- `operator-rs` `0.27.1` -> `0.30.2` ([#187])

[#187]: https://github.com/stackabletech/spark-k8s-operator/pull/187

### Changed

- Updated stackable image versions ([#176])
Expand Down Expand Up @@ -43,7 +50,6 @@ All notable changes to this project will be documented in this file.
- Update RBAC properties for OpenShift compatibility ([#126]).

[#112]: https://github.com/stackabletech/spark-k8s-operator/pull/112
[#114]: https://github.com/stackabletech/spark-k8s-operator/pull/114
[#126]: https://github.com/stackabletech/spark-k8s-operator/pull/126

## [0.4.0] - 2022-08-03
Expand Down
9 changes: 5 additions & 4 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

4 changes: 4 additions & 0 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,10 @@ SHELL=/usr/bin/env bash -euo pipefail
render-readme:
scripts/render_readme.sh

## Alternative Dockerfile that uses cargo chef to speed up dev builds.
docker-build-alt:
docker build --build-arg VERSION=${VERSION} -t "docker.stackable.tech/stackable/spark-k8s-operator:${VERSION}" -f docker/Dockerfile.alternative .

## Docker related targets
docker-build:
docker build --force-rm --build-arg VERSION=${VERSION} -t "docker.stackable.tech/stackable/spark-k8s-operator:${VERSION}" -f docker/Dockerfile .
Expand Down
700 changes: 700 additions & 0 deletions deploy/helm/spark-k8s-operator/crds/crds.yaml

Large diffs are not rendered by default.

3 changes: 3 additions & 0 deletions deploy/helm/spark-k8s-operator/templates/deployment.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,9 @@ spec:
{{- toYaml .Values.podSecurityContext | nindent 8 }}
containers:
- name: {{ include "operator.appname" . }}
env:
- name: SPARK_K8S_OPERATOR_LOG
value: debug
securityContext:
{{- toYaml .Values.securityContext | nindent 12 }}
image: "{{ .Values.image.repository }}:{{ .Values.image.tag | default .Chart.AppVersion }}"
Expand Down
1 change: 1 addition & 0 deletions deploy/helm/spark-k8s-operator/templates/roles.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -84,6 +84,7 @@ rules:
- spark.stackable.tech
resources:
- sparkapplications
- sparkhistoryservers
verbs:
- get
- list
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,7 @@ rules:
- ""
resources:
- configmaps
- persistentvolumeclaims
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems weird but it is what it is

- pods
- secrets
- serviceaccounts
Expand Down
700 changes: 700 additions & 0 deletions deploy/manifests/crds.yaml

Large diffs are not rendered by default.

3 changes: 3 additions & 0 deletions deploy/manifests/deployment.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,9 @@ spec:
securityContext: {}
containers:
- name: spark-k8s-operator
env:
- name: SPARK_K8S_OPERATOR_LOG
value: debug
securityContext:
allowPrivilegeEscalation: false
capabilities:
Expand Down
1 change: 1 addition & 0 deletions deploy/manifests/roles.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -84,6 +84,7 @@ rules:
- spark.stackable.tech
resources:
- sparkapplications
- sparkhistoryservers
verbs:
- get
- list
Expand Down
1 change: 1 addition & 0 deletions deploy/manifests/spark-clusterrole.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ rules:
- ""
resources:
- configmaps
- persistentvolumeclaims
- pods
- secrets
- serviceaccounts
Expand Down
93 changes: 93 additions & 0 deletions docker/Dockerfile.alternative
Original file line number Diff line number Diff line change
@@ -0,0 +1,93 @@
FROM registry.access.redhat.com/ubi8/ubi-minimal:8.6@sha256:c5ffdf5938d73283cec018f2adf59f0ed9f8c376d93e415a27b16c3c6aad6f45 AS chef
LABEL maintainer="Stackable GmbH"

# https://github.com/hadolint/hadolint/wiki/DL4006
SHELL ["/bin/bash", "-o", "pipefail", "-c"]

# Update image and install everything needed for Rustup & Rust
RUN microdnf update --disablerepo=* --enablerepo=ubi-8-appstream-rpms --enablerepo=ubi-8-baseos-rpms -y \
&& rm -rf /var/cache/yum \
&& microdnf install --disablerepo=* --enablerepo=ubi-8-appstream-rpms --enablerepo=ubi-8-baseos-rpms curl findutils gcc gcc-c++ make cmake openssl-devel pkg-config systemd-devel unzip -y \
&& rm -rf /var/cache/yum

WORKDIR /opt/protoc
RUN PROTOC_VERSION=21.5 \
ARCH=$(arch | sed 's/^aarch64$/aarch_64/') \
&& curl --location --output protoc.zip "https://repo.stackable.tech/repository/packages/protoc/protoc-${PROTOC_VERSION}-linux-${ARCH}.zip" \
&& unzip protoc.zip \
&& rm protoc.zip
ENV PROTOC=/opt/protoc/bin/protoc
WORKDIR /

# IMPORTANT
# If you change the toolchain version here, make sure to also change the "rust_version"
# property in operator-templating/repositories.yaml
RUN curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh -s -- -y --default-toolchain 1.63.0 \
&& . $HOME/.cargo/env \
&& cargo install cargo-chef --locked

WORKDIR /src

FROM chef AS planner

COPY . .
RUN . $HOME/.cargo/env && cargo chef prepare --recipe-path recipe.json

FROM chef AS builder

COPY --from=planner /src/recipe.json recipe.json

# Build dependencies - this is the caching Docker layer!
RUN . $HOME/.cargo/env && cargo chef cook --release --recipe-path recipe.json

# Build application
COPY . .
RUN . $HOME/.cargo/env && cargo build --release

WORKDIR /app

# Copy the "interesting" files into /app.
RUN find /src/target/release \
-regextype egrep \
# The interesting binaries are all directly in ${BUILD_DIR}.
-maxdepth 1 \
# Well, binaries are executable.
-executable \
# Well, binaries are files.
-type f \
# Filter out tests.
! -regex ".*\-[a-fA-F0-9]{16,16}$" \
# Copy the matching files into /app.
-exec cp {} /app \;

RUN echo "The following files will be copied to the runtime image: $(ls /app)"

FROM registry.access.redhat.com/ubi8/ubi-minimal AS operator

ARG VERSION
ARG RELEASE="1"

LABEL name="Stackable Operator for Apache Spark-on-Kubernetes" \
maintainer="[email protected]" \
vendor="Stackable GmbH" \
version="${VERSION}" \
release="${RELEASE}" \
summary="Deploy and manage Apache Spark-on-Kubernetes clusters." \
description="Deploy and manage Apache Spark-on-Kubernetes clusters."

RUN microdnf install -y yum \
&& yum -y update-minimal --security --sec-severity=Important --sec-severity=Critical \
&& yum clean all \
&& microdnf clean all

COPY LICENSE /licenses/LICENSE

COPY --from=builder /app/stackable-spark-k8s-operator /
COPY deploy/config-spec/properties.yaml /etc/stackable/spark-k8s-operator/config-spec/properties.yaml

RUN groupadd -g 1000 stackable && adduser -u 1000 -g stackable -c 'Stackable Operator' stackable

USER stackable:stackable

ENTRYPOINT ["/stackable-spark-k8s-operator"]
CMD ["run"]
37 changes: 37 additions & 0 deletions docs/modules/ROOT/examples/example-history-app.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
---
apiVersion: spark.stackable.tech/v1alpha1
kind: SparkApplication
metadata:
name: spark-pi-s3-1
spec:
version: "1.0"
sparkImage: docker.stackable.tech/stackable/spark-k8s:3.3.0-stackable0.3.0
sparkImagePullPolicy: IfNotPresent
mode: cluster
mainClass: org.apache.spark.examples.SparkPi
mainApplicationFile: s3a://my-bucket/spark-examples_2.12-3.3.0.jar
s3bucket: # <1>
inline:
bucketName: my-bucket
connection:
inline:
host: test-minio
port: 9000
accessStyle: Path
credentials:
secretClass: s3-credentials-class # <2>
logFileDirectory: # <3>
s3:
prefix: eventlogs/ # <4>
bucket:
inline:
bucketName: spark-logs # <5>
connection:
inline:
host: test-minio
port: 9000
accessStyle: Path
credentials:
secretClass: history-credentials-class # <6>
executor:
instances: 1
29 changes: 29 additions & 0 deletions docs/modules/ROOT/examples/example-history-server.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
---
apiVersion: spark.stackable.tech/v1alpha1
kind: SparkHistoryServer
metadata:
name: spark-history
spec:
image:
productVersion: 3.3.0
stackableVersion: 0.3.0
logFileDirectory: # <1>
s3:
prefix: eventlogs/ # <2>
bucket: # <3>
inline:
bucketName: spark-logs
connection:
inline:
host: test-minio
port: 9000
accessStyle: Path
credentials:
secretClass: s3-credentials-class
sparkConf: # <4>
nodes:
roleGroups:
cleaner:
replicas: 1 # <5>
config:
cleaner: true # <6>
Binary file added docs/modules/ROOT/images/history-server-ui.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
1 change: 1 addition & 0 deletions docs/modules/ROOT/nav.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -2,3 +2,4 @@
* xref:usage.adoc[]
* xref:job_dependencies.adoc[]
* xref:rbac.adoc[]
* xref:history_server.adoc[]
50 changes: 50 additions & 0 deletions docs/modules/ROOT/pages/history_server.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
= Spark History Server

== Overview

The Stackable Spark-on-Kubernetes operator runs Apache Spark workloads in a Kubernetes cluster, whereby driver- and executor-pods are created for the duration of the job and then terminated. One or more Spark History Server instances can be deployed independently of `SparkApplication` jobs and used as an end-point for spark logging, so that job information can be viewed once the job pods are no longer available.

== Deployment

The example below demonstrates how to set up the history server running in one Pod with scheduled cleanups of the event logs. The event logs are loaded from an S3 bucket named `spark-logs` and the folder `eventlogs/`. The credentials for this bucket are provided by the secret class `s3-credentials-class`. For more details on how the Stackable Data Platform manages S3 resources see the xref:home:concepts:s3.adoc[S3 resources] page.


[source,yaml]
----
include::example$example-history-server.yaml[]
----

<1> The location of the event logs. Must be a S3 bucket. Future implementations might add support for other shared filesystems such as HDFS.
<2> Folder within the S3 bucket where the log files are located. This is folder is required and mus exist before setting up the history server.
<3> The S3 bucket definition, here provided in-line.
<4> Additional gistory server configuration properties can be provided here as a map. For possible properties see: https://spark.apache.org/docs/latest/monitoring.html#spark-history-server-configuration-options
<5> This deployment has only one Pod. Multiple history servers can be started, all reading the same event logs by increasing the relica count.
<6> This history server will automatically clean up old log files by using default properties. You can change any of these by using the `sparkConf` map.

NOTE: Only one role group can have scheduled cleanups enabled (`cleaner: true`) and this role group can have a maximum replica of 1.

== Application configuration


The example below demonstrates how to configure Spark applications store log events to a S3 bucket.

[source,yaml]
----
include::example$example-history-app.yaml[]
----

<1> Location of the data that is being processed by the application.
<2> Credentials used to access the data above.
<3> Instruct the operator to configure the application with logging enabled.
<4> Folder to store logs. This must match the prefix used by the history server.
<5> Bucket to store logs. This must match the bucket used by the history server.
<6> Not used by the application! The operator will ignore this and use the credentials from the `s3bucket` to store event logs.



== History Web UI

The history exposes a user console on port 18080. By setting up port-forwarding on 18080 this UI can be opened in a browser to show running and completed jobs:

image::history-server-ui.png[History Server Console]

5 changes: 3 additions & 2 deletions rust/crd/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -9,14 +9,15 @@ version = "0.7.0-nightly"
publish = false

[dependencies]
stackable-operator = { git = "https://github.com/stackabletech/operator-rs.git", tag="0.27.1" }
stackable-operator = { git = "https://github.com/stackabletech/operator-rs.git", tag="0.30.2" }

semver = "1.0"
serde = { version = "1.0", features = ["derive"] }
serde = "1.0"
serde_json = "1.0"
serde_yaml = "0.8"
snafu = "0.7"
strum = { version = "0.24", features = ["derive"] }
tracing = "0.1"

[dev-dependencies]
rstest = "0.16.0"
17 changes: 17 additions & 0 deletions rust/crd/src/constants.rs
Original file line number Diff line number Diff line change
Expand Up @@ -24,3 +24,20 @@ pub const S3_SECRET_DIR_NAME: &str = "/stackable/secrets";
pub const MIN_MEMORY_OVERHEAD: u32 = 384;
pub const JVM_OVERHEAD_FACTOR: f32 = 0.1;
pub const NON_JVM_OVERHEAD_FACTOR: f32 = 0.4;

pub const OPERATOR_NAME: &str = "spark.stackable.tech";
pub const CONTROLLER_NAME: &str = "sparkapplication";
pub const POD_DRIVER_CONTROLLER_NAME: &str = "pod-driver";
pub const HISTORY_CONTROLLER_NAME: &str = "history";

pub const HISTORY_ROLE_NAME: &str = "node";

pub const HISTORY_IMAGE_BASE_NAME: &str = "spark-k8s";

pub const HISTORY_CONFIG_FILE_NAME: &str = "spark-defaults.conf";
pub const HISTORY_CONFIG_FILE_NAME_FULL: &str = "/stackable/spark/conf/spark-defaults.conf";

pub const LABEL_NAME_INSTANCE: &str = "app.kubernetes.io/instance";

pub const VOLUME_NAME_S3_CREDENTIALS: &str = "s3-credentials";
pub const SPARK_CLUSTER_ROLE: &str = "spark-k8s-clusterrole";
Loading