Investigate setting up a history server #84

razvan · 2022-06-15T08:21:22Z

Description

Spark monitoring with the history server: https://spark.apache.org/docs/latest/monitoring.html

An example using the Google operator and a shared NFS volume : https://stackoverflow.com/a/58593909

Multiple buckets: https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/index.html#Configuring_different_S3_buckets_with_Per-Bucket_Configuration

Refinement questions

CRD and instances

Create a CRD specifically for the history server ?

It should be possible to create multiple instances of history servers. The history server CR can reference an S3Bucket.
Spark applications can reference one of these.
The operator resolves the necessary spark configuration for the job and adds it to the spark-submit command.

The job references a history server -> the operator reads the endpoint, bucket, path from there but uses the same credentials as the data access.
There are separate s3 connection definitions for reading and writing. The job uses the write connection while the history server uses the read connection.

History event storage

Support for ReadWriteMany volumes ? Keep as issue for the future.
Integration with Hadoop ? Keep as issue for the future. Depends on a future HDFSConnection object.

UI usage and security

Kerberos support ? Keep as issue for the future.
Will internal links in the web UI use the published k8s service name ? Eventually it's only a problem with the ui of the running applications, not the UI of the history server.

Acceptance criteria:

we have a plan! An ADR.

lfrancke · 2022-09-06T08:46:04Z

I'm looking at this

adwk67 assigned razvan Aug 18, 2022

razvan added the type/research label Aug 22, 2022

razvan mentioned this issue Aug 23, 2022

Spark History Server ADR stackabletech/documentation#253

Merged

lfrancke moved this to Development: In Progress in Stackable Engineering Aug 23, 2022

lfrancke added this to Stackable Engineering Aug 23, 2022

razvan mentioned this issue Aug 24, 2022

Implement ADR 22 (Spark history server) #124

Closed

4 tasks

adwk67 moved this from Development: In Progress to Development: In Review in Stackable Engineering Aug 24, 2022

lfrancke removed the type/research label Aug 31, 2022

razvan closed this as completed in stackabletech/documentation#253 Sep 2, 2022

sbernauer moved this from Development: In Review to Development: Done in Stackable Engineering Sep 2, 2022

lfrancke moved this from Development: Done to Acceptance: Waiting for in Stackable Engineering Sep 2, 2022

lfrancke moved this from Acceptance: Waiting for to Acceptance: In Progress in Stackable Engineering Sep 5, 2022

lfrancke moved this from Acceptance: In Progress to Done in Stackable Engineering Sep 8, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Investigate setting up a history server #84

Investigate setting up a history server #84

razvan commented Jun 15, 2022 •

edited

Loading

lfrancke commented Sep 6, 2022

Uh oh!

Uh oh!

Investigate setting up a history server #84

Investigate setting up a history server #84

Comments

razvan commented Jun 15, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Refinement questions

CRD and instances

History event storage

UI usage and security

Acceptance criteria:

lfrancke commented Sep 6, 2022

Uh oh!

razvan commented Jun 15, 2022 •

edited

Loading