Skip to content

[Merged by Bors] - Logging #226

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 43 commits into from
Closed
Show file tree
Hide file tree
Changes from 36 commits
Commits
Show all changes
43 commits
Select commit Hold shift + click to select a range
631aa0e
Allow fragments for job, driver, and executor configurations
siegfriedweber Mar 10, 2023
dd33058
Implement log aggregation for the Spark driver
siegfriedweber Mar 9, 2023
d36a0f8
Shutdown Vector after completion of the main container
siegfriedweber Mar 10, 2023
e418781
Use separate config maps for driver and executor
siegfriedweber Mar 22, 2023
f993971
Add integration test for log aggregation
siegfriedweber Mar 23, 2023
c7bebf1
Add log aggregation for the init containers and the submit job
siegfriedweber Mar 23, 2023
5bc8767
Add support for custom log configurations
siegfriedweber Mar 24, 2023
eb118ad
Add logging to the history server
siegfriedweber Mar 28, 2023
fa9bfbe
Add logging tests for pyspark
siegfriedweber Mar 28, 2023
4848b92
Fix integration tests
siegfriedweber Mar 28, 2023
6b0dfed
Add necessary volume mounts to the Spark Submit job
siegfriedweber Mar 29, 2023
45391ff
Enable log aggregation in all integration tests
siegfriedweber Mar 30, 2023
13dff68
Reorganize imports
siegfriedweber Mar 30, 2023
fed4572
Fix bug in configuration merge
siegfriedweber Mar 30, 2023
21bef7a
Make volume mounts and node selector optional
siegfriedweber Mar 30, 2023
512146a
Remove unused error variants
siegfriedweber Mar 30, 2023
5413cf6
Upgrade stackable-operator to version 0.38.0
siegfriedweber Mar 30, 2023
6abfa4a
Upgrade clap
siegfriedweber Mar 30, 2023
a03f344
Update changelog
siegfriedweber Mar 30, 2023
b5ccdb7
Use volume mounts from the merged configuration
siegfriedweber Mar 30, 2023
8196a96
Merge branch 'fragment-config' into logging
siegfriedweber Mar 30, 2023
661fc61
Fix Clippy warning
siegfriedweber Mar 30, 2023
bc4226a
Merge branch 'fragment-config' into logging
siegfriedweber Mar 30, 2023
2d1e8eb
Add SubmitJobContainer struct
siegfriedweber Mar 30, 2023
fb742a1
Remove unused constants
siegfriedweber Mar 30, 2023
9e42564
Adapt spark-submit container in integration tests
siegfriedweber Mar 31, 2023
6a170f2
Use separate indexes for test steps
siegfriedweber Mar 31, 2023
aaa3cf0
Wait 10 seconds in all tests steps for MinIO to start up
siegfriedweber Mar 31, 2023
68cfb03
Use distinct names for SecretClasses in the tests
siegfriedweber Mar 31, 2023
d508ffd
Merge branch 'fragment-config' into logging
siegfriedweber Mar 31, 2023
b66c003
Fix flakiness in logging test
siegfriedweber Apr 3, 2023
0c6ac39
Merge branch 'main' into logging
siegfriedweber Apr 3, 2023
2045cc8
Document log aggregation
siegfriedweber Apr 3, 2023
0240a91
Update changelog
siegfriedweber Apr 3, 2023
0068dae
Fix clippy warnings
siegfriedweber Apr 3, 2023
a9457e1
Reserve space for the log files of the init containers
siegfriedweber Apr 3, 2023
9e8ed3f
Apply review suggestions
siegfriedweber Apr 4, 2023
84b283d
Remove redundant container name from error variant InvalidContainerName
siegfriedweber Apr 4, 2023
4d2dbe8
Remove unused error variant
siegfriedweber Apr 4, 2023
b971cf5
Fix ConfigMap names in the logging integration test
siegfriedweber Apr 4, 2023
da2ade6
Discard logs in integration test if upstream Vector aggregator is not…
siegfriedweber Apr 4, 2023
0b46e7f
Remove unused ConfigMap vector-aggregator-discovery from logging test
siegfriedweber Apr 4, 2023
ad89db3
Use defaultJavaOptions instead of extraJavaOptions
siegfriedweber Apr 4, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ All notable changes to this project will be documented in this file.

- Deploy default and support custom affinities ([#217])
- BREAKING: Dropped support for old spec.{driver,executor}.nodeSelector field. Use spec.{driver,executor}.affinity.nodeSelector instead ([#217])
- Log aggregation added ([#226]).

### Changed

Expand All @@ -19,6 +20,7 @@ All notable changes to this project will be documented in this file.
[#207]: https://github.com/stackabletech/spark-k8s-operator/pull/207
[#217]: https://github.com/stackabletech/spark-k8s-operator/pull/217
[#223]: https://github.com/stackabletech/spark-k8s-operator/pull/223
[#226]: https://github.com/stackabletech/spark-k8s-operator/pull/226

## [23.1.0] - 2023-01-23

Expand Down
388 changes: 388 additions & 0 deletions deploy/helm/spark-k8s-operator/crds/crds.yaml

Large diffs are not rendered by default.

17 changes: 17 additions & 0 deletions docs/modules/spark-k8s/pages/history_server.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,23 @@ include::example$example-history-app.yaml[]
<6> Credentials used to write event logs. These can, of course, differ from the credentials used to process data.


== Log aggregation

The logs can be forwarded to a Vector log aggregator by providing a discovery
ConfigMap for the aggregator and by enabling the log agent:

[source,yaml]
----
spec:
vectorAggregatorConfigMapName: vector-aggregator-discovery
nodes:
config:
logging:
enableVectorAgent: true
----

Further information on how to configure logging, can be found in
xref:home:concepts:logging.adoc[].

== History Web UI

Expand Down
22 changes: 22 additions & 0 deletions docs/modules/spark-k8s/pages/usage.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -177,6 +177,28 @@ Spark allocates a default amount of non-heap memory based on the type of job (JV

NOTE: It is possible to define Spark resources either directly by setting configuration properties listed under `sparkConf`, or by using resource limits. If both are used, then `sparkConf` properties take precedence. It is recommended for the sake of clarity to use *_either_* one *_or_* the other.

== Log aggregation

The logs can be forwarded to a Vector log aggregator by providing a discovery
ConfigMap for the aggregator and by enabling the log agent:

[source,yaml]
----
spec:
vectorAggregatorConfigMapName: vector-aggregator-discovery
job:
logging:
enableVectorAgent: true
driver:
logging:
enableVectorAgent: true
executor:
logging:
enableVectorAgent: true
----

Further information on how to configure logging, can be found in
xref:home:concepts:logging.adoc[].

== CRD argument coverage

Expand Down
29 changes: 21 additions & 8 deletions rust/crd/src/constants.rs
Original file line number Diff line number Diff line change
@@ -1,21 +1,29 @@
pub const APP_NAME: &str = "spark-k8s";

pub const VOLUME_MOUNT_NAME_POD_TEMPLATES: &str = "pod-template";
pub const VOLUME_MOUNT_PATH_POD_TEMPLATES: &str = "/stackable/spark/pod-templates";
pub const VOLUME_MOUNT_NAME_DRIVER_POD_TEMPLATES: &str = "driver-pod-template";
pub const VOLUME_MOUNT_PATH_DRIVER_POD_TEMPLATES: &str = "/stackable/spark/driver-pod-templates";

pub const VOLUME_MOUNT_NAME_EXECUTOR_POD_TEMPLATES: &str = "executor-pod-template";
pub const VOLUME_MOUNT_PATH_EXECUTOR_POD_TEMPLATES: &str =
"/stackable/spark/executor-pod-templates";

pub const POD_TEMPLATE_FILE: &str = "template.yaml";

pub const VOLUME_MOUNT_NAME_CONFIG: &str = "config";

pub const CONTAINER_NAME_JOB: &str = "job";
pub const VOLUME_MOUNT_NAME_JOB: &str = "job-files";
pub const VOLUME_MOUNT_PATH_JOB: &str = "/stackable/spark/jobs";

pub const CONTAINER_NAME_REQ: &str = "requirements";
pub const VOLUME_MOUNT_NAME_REQ: &str = "req-files";
pub const VOLUME_MOUNT_PATH_REQ: &str = "/stackable/spark/requirements";

pub const CONTAINER_IMAGE_NAME_DRIVER: &str = "dummy-overwritten-by-command-line";
pub const CONTAINER_NAME_DRIVER: &str = "spark-driver";
pub const VOLUME_MOUNT_NAME_LOG_CONFIG: &str = "log-config";
pub const VOLUME_MOUNT_PATH_LOG_CONFIG: &str = "/stackable/log_config";

pub const CONTAINER_IMAGE_NAME_EXECUTOR: &str = "dummy-overwritten-by-command-line";
pub const CONTAINER_NAME_EXECUTOR: &str = "spark-executor";
pub const VOLUME_MOUNT_NAME_LOG: &str = "log";
pub const VOLUME_MOUNT_PATH_LOG: &str = "/stackable/log";

pub const LOG4J2_CONFIG_FILE: &str = "log4j2.properties";

pub const ACCESS_KEY_ID: &str = "accessKey";
pub const SECRET_ACCESS_KEY: &str = "secretKey";
Expand All @@ -25,6 +33,11 @@ pub const MIN_MEMORY_OVERHEAD: u32 = 384;
pub const JVM_OVERHEAD_FACTOR: f32 = 0.1;
pub const NON_JVM_OVERHEAD_FACTOR: f32 = 0.4;

pub const MAX_SPARK_LOG_FILES_SIZE_IN_MIB: u32 = 10;
pub const MAX_INIT_CONTAINER_LOG_FILES_SIZE_IN_MIB: u32 = 1;
pub const LOG_VOLUME_SIZE_IN_MIB: u32 =
MAX_SPARK_LOG_FILES_SIZE_IN_MIB + MAX_INIT_CONTAINER_LOG_FILES_SIZE_IN_MIB;

pub const OPERATOR_NAME: &str = "spark.stackable.tech";
pub const CONTROLLER_NAME: &str = "sparkapplication";
pub const POD_DRIVER_CONTROLLER_NAME: &str = "pod-driver";
Expand Down
30 changes: 29 additions & 1 deletion rust/crd/src/history.rs
Original file line number Diff line number Diff line change
Expand Up @@ -27,10 +27,11 @@ use stackable_operator::{
transform_all_roles_to_config, validate_all_roles_and_groups_config, Configuration,
ValidatedRoleConfigByPropertyKind,
},
product_logging::{self, spec::Logging},
role_utils::{Role, RoleGroupRef},
schemars::{self, JsonSchema},
};
use strum::Display;
use strum::{Display, EnumIter};

#[derive(Snafu, Debug)]
pub enum Error {
Expand Down Expand Up @@ -62,6 +63,10 @@ pub enum Error {
#[serde(rename_all = "camelCase")]
pub struct SparkHistoryServerSpec {
pub image: ProductImage,
/// Name of the Vector aggregator discovery ConfigMap.
/// It must contain the key `ADDRESS` with the address of the Vector aggregator.
#[serde(skip_serializing_if = "Option::is_none")]
pub vector_aggregator_config_map_name: Option<String>,
pub log_file_directory: LogFileDirectorySpec,
#[serde(skip_serializing_if = "Option::is_none")]
pub spark_conf: Option<BTreeMap<String, String>>,
Expand Down Expand Up @@ -180,6 +185,26 @@ pub struct S3LogFileDirectorySpec {
)]
pub struct HistoryStorageConfig {}

#[derive(
Clone,
Debug,
Deserialize,
Display,
Eq,
EnumIter,
JsonSchema,
Ord,
PartialEq,
PartialOrd,
Serialize,
)]
#[serde(rename_all = "kebab-case")]
#[strum(serialize_all = "kebab-case")]
pub enum SparkHistoryServerContainer {
SparkHistory,
Vector,
}

#[derive(Clone, Debug, Default, JsonSchema, PartialEq, Fragment)]
#[fragment_attrs(
derive(
Expand All @@ -200,6 +225,8 @@ pub struct HistoryConfig {
#[fragment_attrs(serde(default))]
pub resources: Resources<HistoryStorageConfig, NoRuntimeLimits>,
#[fragment_attrs(serde(default))]
pub logging: Logging<SparkHistoryServerContainer>,
#[fragment_attrs(serde(default))]
pub affinity: StackableAffinity,
}

Expand All @@ -218,6 +245,7 @@ impl HistoryConfig {
},
storage: HistoryStorageConfigFragment {},
},
logging: product_logging::spec::default_logging(),
affinity: history_affinity(cluster_name),
}
}
Expand Down
Loading