Skip to content
This repository was archived by the owner on Feb 16, 2024. It is now read-only.

Commit 3ea65f1

Browse files
fhennigadwk67
andauthored
Add demo system requirements (#280)
* added airflow * ... * ... * added hbase-hdfs * trino-taxi-data * added trino-iceberg * added some more * added logging * ... * ... * Update docs/modules/ROOT/pages/demos/trino-taxi-data.adoc Co-authored-by: Andrew Kenworthy <[email protected]> * Update docs/modules/ROOT/pages/demos/airflow-scheduled-job.adoc Co-authored-by: Andrew Kenworthy <[email protected]> * Update docs/modules/ROOT/pages/demos/data-lakehouse-iceberg-trino-spark.adoc Co-authored-by: Andrew Kenworthy <[email protected]> * Update docs/modules/ROOT/pages/demos/hbase-hdfs-load-cycling-data.adoc Co-authored-by: Andrew Kenworthy <[email protected]> * Update docs/modules/ROOT/pages/demos/jupyterhub-pyspark-hdfs-anomaly-detection-taxi-data.adoc Co-authored-by: Andrew Kenworthy <[email protected]> * Update docs/modules/ROOT/pages/demos/logging.adoc Co-authored-by: Andrew Kenworthy <[email protected]> * Update docs/modules/ROOT/pages/demos/nifi-kafka-druid-earthquake-data.adoc Co-authored-by: Andrew Kenworthy <[email protected]> * Update docs/modules/ROOT/pages/demos/nifi-kafka-druid-water-level-data.adoc Co-authored-by: Andrew Kenworthy <[email protected]> * Update docs/modules/ROOT/pages/demos/spark-k8s-anomaly-detection-taxi-data.adoc Co-authored-by: Andrew Kenworthy <[email protected]> * Update docs/modules/ROOT/pages/demos/trino-iceberg.adoc Co-authored-by: Andrew Kenworthy <[email protected]> * rounded up CPU numbers --------- Co-authored-by: Andrew Kenworthy <[email protected]>
1 parent 71be854 commit 3ea65f1

10 files changed

+160
-99
lines changed

docs/modules/ROOT/pages/demos/airflow-scheduled-job.adoc

Lines changed: 15 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,5 @@
11
= airflow-scheduled-job
22

3-
[NOTE]
4-
====
5-
This guide assumes that you already have the demo `airflow-scheduled-job` installed.
6-
If you don't have it installed please follow the xref:commands/demo.adoc#_install_demo[documentation on how to install a demo].
7-
To put it simply you have to run `stackablectl demo install airflow-scheduled-job`.
8-
====
9-
103
This demo will
114

125
* Install the required Stackable operators
@@ -22,6 +15,21 @@ You can see the deployed products as well as their relationship in the following
2215

2316
image::demo-airflow-scheduled-job/overview.png[]
2417

18+
[#system-requirements]
19+
== System requirements
20+
21+
To run this demo, your system needs at least:
22+
23+
* 2.5 https://kubernetes.io/docs/tasks/debug/debug-cluster/resource-metrics-pipeline/#cpu[cpu units] (core/hyperthread)
24+
* 9GiB memory
25+
* 24GiB disk storage
26+
27+
[#installation]
28+
== Installation
29+
30+
Please follow the xref:commands/demo.adoc#_install_demo[documentation on how to install a demo].
31+
To put it simply you just have to run `stackablectl demo install airflow-scheduled-job`.
32+
2533
== List deployed Stackable services
2634
To list the installed Stackable services run the following command:
2735

docs/modules/ROOT/pages/demos/data-lakehouse-iceberg-trino-spark.adoc

Lines changed: 20 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -1,36 +1,20 @@
11
= data-lakehouse-iceberg-trino-spark
22

3-
[WARNING]
3+
[IMPORTANT]
44
====
55
This demo shows a data workload with real world data volumes and uses significant amount of resources to ensure acceptable response times.
66
It will most likely not run on your workstation.
77
88
There is also the smaller xref:demos/trino-iceberg.adoc[] demo focusing on the abilities a lakehouse using Apache Iceberg offers.
99
The `trino-iceberg` demo has no streaming data part and can be executed on a local workstation.
10-
11-
The demo was developed and tested on a kubernetes cluster with 10 nodes (4 cores (8 threads), 20GB RAM and 30GB HDD).
12-
Instance types that loosely correspond to this on the Hyperscalers are:
13-
14-
- *Google*: `e2-standard-8`
15-
- *Azure*: `Standard_D4_v2`
16-
- *AWS*: `m5.2xlarge`
17-
18-
In addition to these nodes the operators will request multiple persistent volumes with a total capacity of about 1TB.
1910
====
2011

21-
[WARNING]
12+
[CAUTION]
2213
====
2314
This demo only runs in the `default` namespace, as a `ServiceAccount` will be created.
2415
Additionally, we have to use the fqdn service names (including the namespace), so that the used TLS certificates are valid.
2516
====
2617

27-
[NOTE]
28-
====
29-
This guide assumes that you already have the demo `data-lakehouse-iceberg-trino-spark` installed.
30-
If you don't have it installed please follow the xref:commands/demo.adoc#_install_demo[documentation on how to install a demo].
31-
To put it simply you have to run `stackablectl demo install data-lakehouse-iceberg-trino-spark`.
32-
====
33-
3418
This demo will
3519

3620
* Install the required Stackable operators
@@ -53,6 +37,24 @@ You can see the deployed products as well as their relationship in the following
5337

5438
image::demo-data-lakehouse-iceberg-trino-spark/overview.png[]
5539

40+
[#system-requirements]
41+
== System requirements
42+
43+
The demo was developed and tested on a kubernetes cluster with 10 nodes (4 cores (8 threads), 20GB RAM and 30GB HDD).
44+
Instance types that loosely correspond to this on the Hyperscalers are:
45+
46+
- *Google*: `e2-standard-8`
47+
- *Azure*: `Standard_D4_v2`
48+
- *AWS*: `m5.2xlarge`
49+
50+
In addition to these nodes the operators will request multiple persistent volumes with a total capacity of about 1TB.
51+
52+
[#installation]
53+
== Installation
54+
55+
Please follow the xref:commands/demo.adoc#_install_demo[documentation on how to install a demo].
56+
To put it simply you just have to run `stackablectl demo install data-lakehouse-iceberg-trino-spark`.
57+
5658
== Apache Iceberg
5759
As Apache Iceberg states on their https://iceberg.apache.org/docs/latest/[website]:
5860

docs/modules/ROOT/pages/demos/hbase-hdfs-load-cycling-data.adoc

Lines changed: 15 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,5 @@
11
= hbase-hdfs-cycling-data
22

3-
[NOTE]
4-
====
5-
This guide assumes that you already have the demo `hbase-hdfs-load-cycling-data` installed.
6-
If you don't have it installed please follow the xref:commands/demo.adoc#_install_demo[documentation on how to install a demo].
7-
To put it simply you have to run `stackablectl demo install hbase-hdfs-load-cycling-data`.
8-
====
9-
103
This demo will
114

125
* Install the required Stackable operators
@@ -22,6 +15,21 @@ You can see the deployed products as well as their relationship in the following
2215

2316
image::demo-hbase-hdfs-load-cycling-data/overview.png[]
2417

18+
[#system-requirements]
19+
== System requirements
20+
21+
To run this demo, your system needs at least:
22+
23+
* 3 https://kubernetes.io/docs/tasks/debug/debug-cluster/resource-metrics-pipeline/#cpu[cpu units] (core/hyperthread)
24+
* 6GiB memory
25+
* 16GiB disk storage
26+
27+
[#installation]
28+
== Installation
29+
30+
Please follow the xref:commands/demo.adoc#_install_demo[documentation on how to install a demo].
31+
To put it simply you just have to run `stackablectl demo install hbase-hdfs-load-cycling-data`.
32+
2533
== List deployed Stackable services
2634
To list the installed Stackable services run the following command:
2735
`stackablectl services list --all-namespaces`

docs/modules/ROOT/pages/demos/jupyterhub-pyspark-hdfs-anomaly-detection-taxi-data.adoc

Lines changed: 21 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -2,30 +2,10 @@
22

33
This demo showcases the integration between https://jupyter.org[Jupyter] and https://hadoop.apache.org/[Apache Hadoop] deployed on the Stackable Data Platform (SDP) Kubernetes cluster. https://jupyterlab.readthedocs.io/en/stable/[JupyterLab] is deployed using the https://github.com/jupyterhub/zero-to-jupyterhub-k8s[pyspark-notebook stack] provided by the Jupyter community. The SDP makes this integration easy by publishing a discovery `ConfigMap` for the HDFS cluster. This `ConfigMap` is then mounted in all `Pods`` running https://spark.apache.org/docs/latest/api/python/getting_started/index.html[PySpark] notebooks so that these have access to HDFS data. For this demo, the HDFS cluster is provisioned with a small sample of the https://www.nyc.gov/site/tlc/about/tlc-trip-record-data.page[NYC taxi trip dataset] which is analyzed with a notebook that is provisioned automatically in the JupyterLab interface .
44

5-
This demo can be installed on most cloud managed Kubernetes clusters as well as on premise or on a reasonably provisioned laptop. Install this demo on an existing Kubernetes cluster:
6-
7-
[source,bash]
8-
----
9-
stackablectl demo install jupyterhub-pyspark-hdfs-anomaly-detection-taxi-data
10-
----
11-
12-
[WARNING]
13-
====
14-
This demo should not be run alongside other demos and requires a minimum of 32 GB RAM and 8 CPUs.
15-
====
16-
17-
[NOTE]
18-
====
19-
Some container images used by this demo are quite large and some steps may take several minutes to complete. If you install this demo locally, on a developer laptop for example, this can lead to timeouts during the installation. If this happens, it's safe to rerun the `stackablectl` command from above.
20-
21-
For more details on how to install Stackable demos see the xref:commands/demo.adoc#_install_demo[documentation].
22-
====
23-
245
== Aim / Context
256

267
This demo does not use the Stackable spark-k8s-operator but rather delegates the creation of executor pods to JupyterHub. The intention is to demonstrate how to interact with SDP components when designing and testing Spark jobs: the resulting script and Spark job definition can then be transferred for use with a Stackable `SparkApplication` resource. When logging in to JupyterHub (described below), a pod will be created with the username as a suffix e.g. `jupyter-admin`. This runs a container that hosts a Jupyter notebook with Spark, Java and Python pre-installed. When the user creates a `SparkSession`, temporary spark executors are created that are persisted until the notebook kernel is shut down or re-started. The notebook can thus be used as a sandbox for writing, testing and benchmarking Spark jobs before they are moved into production.
278

28-
299
== Overview
3010

3111
This demo will:
@@ -39,6 +19,27 @@ This demo will:
3919
* Train an anomaly detection model using PySpark on the data available in HDFS
4020
* Perform some predictions and visualize anomalies
4121

22+
[#system-requirements]
23+
== System requirements
24+
25+
To run this demo, your system needs at least:
26+
27+
* 8 https://kubernetes.io/docs/tasks/debug/debug-cluster/resource-metrics-pipeline/#cpu[cpu units] (core/hyperthread)
28+
* 32GiB memory
29+
* 22GiB disk storage
30+
31+
[#installation]
32+
== Installation
33+
34+
Please follow the xref:commands/demo.adoc#_install_demo[documentation on how to install a demo].
35+
To put it simply you just have to run `stackablectl demo install jupyterhub-pyspark-hdfs-anomaly-detection-taxi-data`.
36+
37+
[NOTE]
38+
====
39+
Some container images used by this demo are quite large and some steps may take several minutes to complete. If you install this demo locally, on a developer laptop for example, this can lead to timeouts during the installation. If this happens, it's safe to rerun the `stackablectl` command from above.
40+
41+
For more details on how to install Stackable demos see the xref:commands/demo.adoc#_install_demo[documentation].
42+
====
4243

4344
== HDFS
4445

docs/modules/ROOT/pages/demos/logging.adoc

Lines changed: 12 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -59,14 +59,20 @@ vm.max_map_count=262144
5959

6060
Then run `sudo sysctl --load` to reload.
6161

62-
== Run the demo
62+
[#system-requirements]
63+
== System requirements
6364

64-
The following command creates a kind cluster and installs this demo:
65+
To run this demo, your system needs at least:
6566

66-
[source,console]
67-
----
68-
$ stackablectl demo install logging --kind-cluster
69-
----
67+
* 6.5 https://kubernetes.io/docs/tasks/debug/debug-cluster/resource-metrics-pipeline/#cpu[cpu units] (core/hyperthread)
68+
* 5GiB memory
69+
* 27GiB disk storage
70+
71+
[#installation]
72+
== Installation
73+
74+
Please follow the xref:commands/demo.adoc#_install_demo[documentation on how to install a demo].
75+
To put it simply you just have to run `stackablectl demo install logging`.
7076

7177
== List deployed Stackable services
7278

docs/modules/ROOT/pages/demos/nifi-kafka-druid-earthquake-data.adoc

Lines changed: 16 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1,18 +1,11 @@
11
= nifi-kafka-druid-earthquake-data
22

3-
[WARNING]
3+
[CAUTION]
44
====
55
This demo only runs in the `default` namespace, as a `ServiceAccount` will be created.
66
Additionally, we have to use the fqdn service names (including the namespace), so that the used TLS certificates are valid.
77
====
88

9-
[NOTE]
10-
====
11-
This guide assumes that you already have the demo `nifi-kafka-druid-earthquake-data` installed.
12-
If you don't have it installed please follow the xref:commands/demo.adoc#_install_demo[documentation on how to install a demo].
13-
To put it simply you have to run `stackablectl demo install nifi-kafka-druid-earthquake-data`.
14-
====
15-
169
This demo will
1710

1811
* Install the required Stackable operators
@@ -32,6 +25,21 @@ You can see the deployed products as well as their relationship in the following
3225

3326
image::demo-nifi-kafka-druid-earthquake-data/overview.png[]
3427

28+
[#system-requirements]
29+
== System requirements
30+
31+
To run this demo, your system needs at least:
32+
33+
* 9 https://kubernetes.io/docs/tasks/debug/debug-cluster/resource-metrics-pipeline/#cpu[cpu units] (core/hyperthread)
34+
* 28GiB memory
35+
* 75GiB disk storage
36+
37+
[#installation]
38+
== Installation
39+
40+
Please follow the xref:commands/demo.adoc#_install_demo[documentation on how to install a demo].
41+
To put it simply you just have to run `stackablectl demo install nifi-kafka-druid-earthquake-data`.
42+
3543
== List deployed Stackable services
3644
To list the installed Stackable services run the following command:
3745

docs/modules/ROOT/pages/demos/nifi-kafka-druid-water-level-data.adoc

Lines changed: 16 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1,18 +1,11 @@
11
= nifi-kafka-druid-water-level-data
22

3-
[WARNING]
3+
[CAUTION]
44
====
55
This demo only runs in the `default` namespace, as a `ServiceAccount` will be created.
66
Additionally, we have to use the fqdn service names (including the namespace), so that the used TLS certificates are valid.
77
====
88

9-
[NOTE]
10-
====
11-
This guide assumes that you already have the demo `nifi-kafka-druid-water-level-data` installed.
12-
If you don't have it installed please follow the xref:commands/demo.adoc#_install_demo[documentation on how to install a demo].
13-
To put it simply you have to run `stackablectl demo install nifi-kafka-druid-water-level-data`.
14-
====
15-
169
This demo will
1710

1811
* Install the required Stackable operators
@@ -34,6 +27,21 @@ You can see the deployed products as well as their relationship in the following
3427

3528
image::demo-nifi-kafka-druid-water-level-data/overview.png[]
3629

30+
[#system-requirements]
31+
== System requirements
32+
33+
To run this demo, your system needs at least:
34+
35+
* 9 https://kubernetes.io/docs/tasks/debug/debug-cluster/resource-metrics-pipeline/#cpu[cpu units] (core/hyperthread)
36+
* 28GiB memory
37+
* 75GiB disk storage
38+
39+
[#installation]
40+
== Installation
41+
42+
Please follow the xref:commands/demo.adoc#_install_demo[documentation on how to install a demo].
43+
To put it simply you just have to run `stackablectl demo install nifi-kafka-druid-water-level-data`.
44+
3745
== List deployed Stackable services
3846
To list the installed Stackable services run the following command:
3947

docs/modules/ROOT/pages/demos/spark-k8s-anomaly-detection-taxi-data.adoc

Lines changed: 15 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -1,16 +1,5 @@
11
= spark-k8s-anomaly-detection-taxi-data
22

3-
[WARNING]
4-
====
5-
This demo should not be run alongside other demos and requires a minimum of 32 GB RAM and 8 CPUs.
6-
====
7-
[NOTE]
8-
====
9-
This guide assumes you already have the demo `spark-k8s-anomaly-detection-taxi-data` installed.
10-
If you don't have it installed please follow the xref:commands/demo.adoc#_install_demo[documentation on how to install a demo].
11-
To put it simply you have to run `stackablectl demo install spark-k8s-anomaly-detection-taxi-data`.
12-
====
13-
143
This demo will
154

165
* Install the required Stackable operators
@@ -29,6 +18,21 @@ You can see the deployed products as well as their relationship in the following
2918
3019
image::spark-k8s-anomaly-detection-taxi-data/overview.png[]
3120

21+
[#system-requirements]
22+
== System requirements
23+
24+
To run this demo, your system needs at least:
25+
26+
* 8 https://kubernetes.io/docs/tasks/debug/debug-cluster/resource-metrics-pipeline/#cpu[cpu units] (core/hyperthread)
27+
* 32GiB memory
28+
* 35GiB disk storage
29+
30+
[#installation]
31+
== Installation
32+
33+
Please follow the xref:commands/demo.adoc#_install_demo[documentation on how to install a demo].
34+
To put it simply you just have to run `stackablectl demo install spark-k8s-anomaly-detection-taxi-data`.
35+
3236
== List deployed Stackable services
3337
To list the installed Stackable services run the following command:
3438

docs/modules/ROOT/pages/demos/trino-iceberg.adoc

Lines changed: 15 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -7,13 +7,6 @@ It focuses on the Trino and Iceberg integration and should run on you local work
77
If you are interested in a more complex lakehouse setup, please have a look at the xref:demos/data-lakehouse-iceberg-trino-spark.adoc[] demo.
88
====
99

10-
[NOTE]
11-
====
12-
This guide assumes that you already have the demo `trino-iceberg` installed.
13-
If you don't have it installed please follow the xref:commands/demo.adoc#_install_demo[documentation on how to install a demo].
14-
To put it simply you have to run `stackablectl demo install trino-iceberg`.
15-
====
16-
1710
This demo will
1811

1912
* Install the required Stackable operators
@@ -22,6 +15,21 @@ This demo will
2215
* Create multiple data lakehouse tables using Apache Iceberg and data from the https://www.tpc.org/tpch/[TPC-H dataset].
2316
* Run some queries to show the benefits of Iceberg
2417
18+
[#system-requirements]
19+
== System requirements
20+
21+
To run this demo, your system needs at least:
22+
23+
* 9 https://kubernetes.io/docs/tasks/debug/debug-cluster/resource-metrics-pipeline/#cpu[cpu units] (core/hyperthread)
24+
* 27GiB memory
25+
* 110GiB disk storage
26+
27+
[#installation]
28+
== Installation
29+
30+
Please follow the xref:commands/demo.adoc#_install_demo[documentation on how to install a demo].
31+
To put it simply you just have to run `stackablectl demo install trino-iceberg`.
32+
2533
== List deployed Stackable services
2634
To list the installed installed Stackable services run the following command:
2735

0 commit comments

Comments
 (0)