Add demo system requirements (#280)

fhennig · adwk67 · web-flow · commit 3ea65f1d827f · 2023-08-15T07:25:55.000Z
* added airflow

* ...

* ...

* added hbase-hdfs

* trino-taxi-data

* added trino-iceberg

* added some more

* added logging

* ...

* ...

* Update docs/modules/ROOT/pages/demos/trino-taxi-data.adoc

Co-authored-by: Andrew Kenworthy &lt;andrew.kenworthy@stackable.de&gt;

* Update docs/modules/ROOT/pages/demos/airflow-scheduled-job.adoc

Co-authored-by: Andrew Kenworthy &lt;andrew.kenworthy@stackable.de&gt;

* Update docs/modules/ROOT/pages/demos/data-lakehouse-iceberg-trino-spark.adoc

Co-authored-by: Andrew Kenworthy &lt;andrew.kenworthy@stackable.de&gt;

* Update docs/modules/ROOT/pages/demos/hbase-hdfs-load-cycling-data.adoc

Co-authored-by: Andrew Kenworthy &lt;andrew.kenworthy@stackable.de&gt;

* Update docs/modules/ROOT/pages/demos/jupyterhub-pyspark-hdfs-anomaly-detection-taxi-data.adoc

Co-authored-by: Andrew Kenworthy &lt;andrew.kenworthy@stackable.de&gt;

* Update docs/modules/ROOT/pages/demos/logging.adoc

Co-authored-by: Andrew Kenworthy &lt;andrew.kenworthy@stackable.de&gt;

* Update docs/modules/ROOT/pages/demos/nifi-kafka-druid-earthquake-data.adoc

Co-authored-by: Andrew Kenworthy &lt;andrew.kenworthy@stackable.de&gt;

* Update docs/modules/ROOT/pages/demos/nifi-kafka-druid-water-level-data.adoc

Co-authored-by: Andrew Kenworthy &lt;andrew.kenworthy@stackable.de&gt;

* Update docs/modules/ROOT/pages/demos/spark-k8s-anomaly-detection-taxi-data.adoc

Co-authored-by: Andrew Kenworthy &lt;andrew.kenworthy@stackable.de&gt;

* Update docs/modules/ROOT/pages/demos/trino-iceberg.adoc

Co-authored-by: Andrew Kenworthy &lt;andrew.kenworthy@stackable.de&gt;

* rounded up CPU numbers

---------

Co-authored-by: Andrew Kenworthy &lt;andrew.kenworthy@stackable.de&gt;
diff --git a/docs/modules/ROOT/pages/demos/airflow-scheduled-job.adoc b/docs/modules/ROOT/pages/demos/airflow-scheduled-job.adoc
@@ -1,12 +1,5 @@
 = airflow-scheduled-job
 
-[NOTE]
-====
-This guide assumes that you already have the demo `airflow-scheduled-job` installed.
-If you don't have it installed please follow the xref:commands/demo.adoc#_install_demo[documentation on how to install a demo].
-To put it simply you have to run `stackablectl demo install airflow-scheduled-job`.
-====
-
 This demo will
 
 * Install the required Stackable operators
@@ -22,6 +15,21 @@ You can see the deployed products as well as their relationship in the following
 
 image::demo-airflow-scheduled-job/overview.png[]
 
+[#system-requirements]
+== System requirements
+
+To run this demo, your system needs at least:
+
+* 2.5 https://kubernetes.io/docs/tasks/debug/debug-cluster/resource-metrics-pipeline/#cpu[cpu units] (core/hyperthread)
+* 9GiB memory
+* 24GiB disk storage
+
+[#installation]
+== Installation
+
+Please follow the xref:commands/demo.adoc#_install_demo[documentation on how to install a demo].
+To put it simply you just have to run `stackablectl demo install airflow-scheduled-job`.
+
 == List deployed Stackable services
 To list the installed Stackable services run the following command:
 
diff --git a/docs/modules/ROOT/pages/demos/data-lakehouse-iceberg-trino-spark.adoc b/docs/modules/ROOT/pages/demos/data-lakehouse-iceberg-trino-spark.adoc
@@ -1,36 +1,20 @@
 = data-lakehouse-iceberg-trino-spark
 
-[WARNING]
+[IMPORTANT]
 ====
 This demo shows a data workload with real world data volumes and uses significant amount of resources to ensure acceptable response times.
 It will most likely not run on your workstation.
 
 There is also the smaller xref:demos/trino-iceberg.adoc[] demo focusing on the abilities a lakehouse using Apache Iceberg offers.
 The `trino-iceberg` demo has no streaming data part and can be executed on a local workstation.
-
-The demo was developed and tested on a kubernetes cluster with 10 nodes (4 cores (8 threads), 20GB RAM and 30GB HDD).
-Instance types that loosely correspond to this on the Hyperscalers are:
-
-- *Google*: `e2-standard-8`
-- *Azure*: `Standard_D4_v2`
-- *AWS*: `m5.2xlarge`
-
-In addition to these nodes the operators will request multiple persistent volumes with a total capacity of about 1TB.
 ====
 
-[WARNING]
+[CAUTION]
 ====
 This demo only runs in the `default` namespace, as a `ServiceAccount` will be created.
 Additionally, we have to use the fqdn service names (including the namespace), so that the used TLS certificates are valid.
 ====
 
-[NOTE]
-====
-This guide assumes that you already have the demo `data-lakehouse-iceberg-trino-spark` installed.
-If you don't have it installed please follow the xref:commands/demo.adoc#_install_demo[documentation on how to install a demo].
-To put it simply you have to run `stackablectl demo install data-lakehouse-iceberg-trino-spark`.
-====
-
 This demo will
 
 * Install the required Stackable operators
@@ -53,6 +37,24 @@ You can see the deployed products as well as their relationship in the following
 
 image::demo-data-lakehouse-iceberg-trino-spark/overview.png[]
 
+[#system-requirements]
+== System requirements
+
+The demo was developed and tested on a kubernetes cluster with 10 nodes (4 cores (8 threads), 20GB RAM and 30GB HDD).
+Instance types that loosely correspond to this on the Hyperscalers are:
+
+- *Google*: `e2-standard-8`
+- *Azure*: `Standard_D4_v2`
+- *AWS*: `m5.2xlarge`
+
+In addition to these nodes the operators will request multiple persistent volumes with a total capacity of about 1TB.
+
+[#installation]
+== Installation
+
+Please follow the xref:commands/demo.adoc#_install_demo[documentation on how to install a demo].
+To put it simply you just have to run `stackablectl demo install data-lakehouse-iceberg-trino-spark`.
+
 == Apache Iceberg
 As Apache Iceberg states on their https://iceberg.apache.org/docs/latest/[website]:
 
diff --git a/docs/modules/ROOT/pages/demos/hbase-hdfs-load-cycling-data.adoc b/docs/modules/ROOT/pages/demos/hbase-hdfs-load-cycling-data.adoc
@@ -1,12 +1,5 @@
 = hbase-hdfs-cycling-data
 
-[NOTE]
-====
-This guide assumes that you already have the demo `hbase-hdfs-load-cycling-data` installed.
-If you don't have it installed please follow the xref:commands/demo.adoc#_install_demo[documentation on how to install a demo].
-To put it simply you have to run `stackablectl demo install hbase-hdfs-load-cycling-data`.
-====
-
 This demo will
 
 * Install the required Stackable operators
@@ -22,6 +15,21 @@ You can see the deployed products as well as their relationship in the following
 
 image::demo-hbase-hdfs-load-cycling-data/overview.png[]
 
+[#system-requirements]
+== System requirements
+
+To run this demo, your system needs at least:
+
+* 3 https://kubernetes.io/docs/tasks/debug/debug-cluster/resource-metrics-pipeline/#cpu[cpu units] (core/hyperthread)
+* 6GiB memory
+* 16GiB disk storage
+
+[#installation]
+== Installation
+
+Please follow the xref:commands/demo.adoc#_install_demo[documentation on how to install a demo].
+To put it simply you just have to run `stackablectl demo install hbase-hdfs-load-cycling-data`.
+
 == List deployed Stackable services
 To list the installed Stackable services run the following command:
 `stackablectl services list --all-namespaces`
diff --git a/docs/modules/ROOT/pages/demos/jupyterhub-pyspark-hdfs-anomaly-detection-taxi-data.adoc b/docs/modules/ROOT/pages/demos/jupyterhub-pyspark-hdfs-anomaly-detection-taxi-data.adoc
@@ -2,30 +2,10 @@
 
 This demo showcases the integration between https://jupyter.org[Jupyter] and https://hadoop.apache.org/[Apache Hadoop] deployed on the Stackable Data Platform (SDP) Kubernetes cluster. https://jupyterlab.readthedocs.io/en/stable/[JupyterLab] is deployed using the https://github.com/jupyterhub/zero-to-jupyterhub-k8s[pyspark-notebook stack] provided by the Jupyter community. The SDP makes this integration easy by publishing a discovery `ConfigMap` for the HDFS cluster. This `ConfigMap` is then mounted in all `Pods`` running https://spark.apache.org/docs/latest/api/python/getting_started/index.html[PySpark] notebooks so that these have access to HDFS data. For this demo, the HDFS cluster is provisioned with a small sample of the https://www.nyc.gov/site/tlc/about/tlc-trip-record-data.page[NYC taxi trip dataset] which is analyzed with a notebook that is provisioned automatically in the JupyterLab interface .
 
-This demo can be installed on most cloud managed Kubernetes clusters as well as on premise or on a reasonably provisioned laptop. Install this demo on an existing Kubernetes cluster:
-
-[source,bash]
-----
-stackablectl demo install jupyterhub-pyspark-hdfs-anomaly-detection-taxi-data
-----
-
-[WARNING]
-====
-This demo should not be run alongside other demos and requires a minimum of 32 GB RAM and 8 CPUs.
-====
-
-[NOTE]
-====
-Some container images used by this demo are quite large and some steps may take several minutes to complete. If you install this demo locally, on a developer laptop for example, this can lead to timeouts during the installation. If this happens, it's safe to rerun the `stackablectl` command from above.
-
-For more details on how to install Stackable demos see the xref:commands/demo.adoc#_install_demo[documentation].
-====
-
 == Aim / Context
 
 This demo does not use the Stackable spark-k8s-operator but rather delegates the creation of executor pods to JupyterHub. The intention is to demonstrate how to interact with SDP components when designing and testing Spark jobs: the resulting script and Spark job definition can then be transferred for use with a Stackable `SparkApplication` resource. When logging in to JupyterHub (described below), a pod will be created with the username as a suffix e.g. `jupyter-admin`. This runs a container that hosts a Jupyter notebook with Spark, Java and Python pre-installed. When the user creates a `SparkSession`, temporary spark executors are created that are persisted until the notebook kernel is shut down or re-started. The notebook can thus be used as a sandbox for writing, testing and benchmarking Spark jobs before they are moved into production.
 
-
 == Overview
 
 This demo will:
@@ -39,6 +19,27 @@ This demo will:
 * Train an anomaly detection model using PySpark on the data available in HDFS
 * Perform some predictions and visualize anomalies
 
+[#system-requirements]
+== System requirements
+
+To run this demo, your system needs at least:
+
+* 8 https://kubernetes.io/docs/tasks/debug/debug-cluster/resource-metrics-pipeline/#cpu[cpu units] (core/hyperthread)
+* 32GiB memory
+* 22GiB disk storage
+
+[#installation]
+== Installation
+
+Please follow the xref:commands/demo.adoc#_install_demo[documentation on how to install a demo].
+To put it simply you just have to run `stackablectl demo install jupyterhub-pyspark-hdfs-anomaly-detection-taxi-data`.
+
+[NOTE]
+====
+Some container images used by this demo are quite large and some steps may take several minutes to complete. If you install this demo locally, on a developer laptop for example, this can lead to timeouts during the installation. If this happens, it's safe to rerun the `stackablectl` command from above.
+
+For more details on how to install Stackable demos see the xref:commands/demo.adoc#_install_demo[documentation].
+====
 
 == HDFS
 
diff --git a/docs/modules/ROOT/pages/demos/logging.adoc b/docs/modules/ROOT/pages/demos/logging.adoc
@@ -59,14 +59,20 @@ vm.max_map_count=262144
 
 Then run `sudo sysctl --load` to reload.
 
-== Run the demo
+[#system-requirements]
+== System requirements
 
-The following command creates a kind cluster and installs this demo:
+To run this demo, your system needs at least:
 
-[source,console]
-----
-$ stackablectl demo install logging --kind-cluster
-----
+* 6.5 https://kubernetes.io/docs/tasks/debug/debug-cluster/resource-metrics-pipeline/#cpu[cpu units] (core/hyperthread)
+* 5GiB memory
+* 27GiB disk storage
+
+[#installation]
+== Installation
+
+Please follow the xref:commands/demo.adoc#_install_demo[documentation on how to install a demo].
+To put it simply you just have to run `stackablectl demo install logging`.
 
 == List deployed Stackable services
 
diff --git a/docs/modules/ROOT/pages/demos/nifi-kafka-druid-earthquake-data.adoc b/docs/modules/ROOT/pages/demos/nifi-kafka-druid-earthquake-data.adoc
@@ -1,18 +1,11 @@
 = nifi-kafka-druid-earthquake-data
 
-[WARNING]
+[CAUTION]
 ====
 This demo only runs in the `default` namespace, as a `ServiceAccount` will be created.
 Additionally, we have to use the fqdn service names (including the namespace), so that the used TLS certificates are valid.
 ====
 
-[NOTE]
-====
-This guide assumes that you already have the demo `nifi-kafka-druid-earthquake-data` installed.
-If you don't have it installed please follow the xref:commands/demo.adoc#_install_demo[documentation on how to install a demo].
-To put it simply you have to run `stackablectl demo install nifi-kafka-druid-earthquake-data`.
-====
-
 This demo will
 
 * Install the required Stackable operators
@@ -32,6 +25,21 @@ You can see the deployed products as well as their relationship in the following
 
 image::demo-nifi-kafka-druid-earthquake-data/overview.png[]
 
+[#system-requirements]
+== System requirements
+
+To run this demo, your system needs at least:
+
+* 9 https://kubernetes.io/docs/tasks/debug/debug-cluster/resource-metrics-pipeline/#cpu[cpu units] (core/hyperthread)
+* 28GiB memory
+* 75GiB disk storage
+
+[#installation]
+== Installation
+
+Please follow the xref:commands/demo.adoc#_install_demo[documentation on how to install a demo].
+To put it simply you just have to run `stackablectl demo install nifi-kafka-druid-earthquake-data`.
+
 == List deployed Stackable services
 To list the installed Stackable services run the following command:
 
diff --git a/docs/modules/ROOT/pages/demos/nifi-kafka-druid-water-level-data.adoc b/docs/modules/ROOT/pages/demos/nifi-kafka-druid-water-level-data.adoc
@@ -1,18 +1,11 @@
 = nifi-kafka-druid-water-level-data
 
-[WARNING]
+[CAUTION]
 ====
 This demo only runs in the `default` namespace, as a `ServiceAccount` will be created.
 Additionally, we have to use the fqdn service names (including the namespace), so that the used TLS certificates are valid.
 ====
 
-[NOTE]
-====
-This guide assumes that you already have the demo `nifi-kafka-druid-water-level-data` installed.
-If you don't have it installed please follow the xref:commands/demo.adoc#_install_demo[documentation on how to install a demo].
-To put it simply you have to run `stackablectl demo install nifi-kafka-druid-water-level-data`.
-====
-
 This demo will
 
 * Install the required Stackable operators
@@ -34,6 +27,21 @@ You can see the deployed products as well as their relationship in the following
 
 image::demo-nifi-kafka-druid-water-level-data/overview.png[]
 
+[#system-requirements]
+== System requirements
+
+To run this demo, your system needs at least:
+
+* 9 https://kubernetes.io/docs/tasks/debug/debug-cluster/resource-metrics-pipeline/#cpu[cpu units] (core/hyperthread)
+* 28GiB memory
+* 75GiB disk storage
+
+[#installation]
+== Installation
+
+Please follow the xref:commands/demo.adoc#_install_demo[documentation on how to install a demo].
+To put it simply you just have to run `stackablectl demo install nifi-kafka-druid-water-level-data`.
+
 == List deployed Stackable services
 To list the installed Stackable services run the following command:
 
diff --git a/docs/modules/ROOT/pages/demos/spark-k8s-anomaly-detection-taxi-data.adoc b/docs/modules/ROOT/pages/demos/spark-k8s-anomaly-detection-taxi-data.adoc
@@ -1,16 +1,5 @@
 = spark-k8s-anomaly-detection-taxi-data
 
-[WARNING]
-====
-This demo should not be run alongside other demos and requires a minimum of 32 GB RAM and 8 CPUs.
-====
-[NOTE]
-====
-This guide assumes you already have the demo `spark-k8s-anomaly-detection-taxi-data` installed.
-If you don't have it installed please follow the xref:commands/demo.adoc#_install_demo[documentation on how to install a demo].
-To put it simply you have to run `stackablectl demo install spark-k8s-anomaly-detection-taxi-data`.
-====
-
 This demo will
 
 * Install the required Stackable operators
@@ -29,6 +18,21 @@ You can see the deployed products as well as their relationship in the following
 
 image::spark-k8s-anomaly-detection-taxi-data/overview.png[]
 
+[#system-requirements]
+== System requirements
+
+To run this demo, your system needs at least:
+
+* 8 https://kubernetes.io/docs/tasks/debug/debug-cluster/resource-metrics-pipeline/#cpu[cpu units] (core/hyperthread)
+* 32GiB memory
+* 35GiB disk storage
+
+[#installation]
+== Installation
+
+Please follow the xref:commands/demo.adoc#_install_demo[documentation on how to install a demo].
+To put it simply you just have to run `stackablectl demo install spark-k8s-anomaly-detection-taxi-data`.
+
 == List deployed Stackable services
 To list the installed Stackable services run the following command:
 
diff --git a/docs/modules/ROOT/pages/demos/trino-iceberg.adoc b/docs/modules/ROOT/pages/demos/trino-iceberg.adoc
@@ -7,13 +7,6 @@ It focuses on the Trino and Iceberg integration and should run on you local work
 If you are interested in a more complex lakehouse setup, please have a look at the xref:demos/data-lakehouse-iceberg-trino-spark.adoc[] demo.
 ====
 
-[NOTE]
-====
-This guide assumes that you already have the demo `trino-iceberg` installed.
-If you don't have it installed please follow the xref:commands/demo.adoc#_install_demo[documentation on how to install a demo].
-To put it simply you have to run `stackablectl demo install trino-iceberg`.
-====
-
 This demo will
 
 * Install the required Stackable operators
@@ -22,6 +15,21 @@ This demo will
 * Create multiple data lakehouse tables using Apache Iceberg and data from the https://www.tpc.org/tpch/[TPC-H dataset].
 * Run some queries to show the benefits of Iceberg
 
+[#system-requirements]
+== System requirements
+
+To run this demo, your system needs at least:
+
+* 9 https://kubernetes.io/docs/tasks/debug/debug-cluster/resource-metrics-pipeline/#cpu[cpu units] (core/hyperthread)
+* 27GiB memory
+* 110GiB disk storage
+
+[#installation]
+== Installation
+
+Please follow the xref:commands/demo.adoc#_install_demo[documentation on how to install a demo].
+To put it simply you just have to run `stackablectl demo install trino-iceberg`.
+
 == List deployed Stackable services
 To list the installed installed Stackable services run the following command:
 
diff --git a/docs/modules/ROOT/pages/demos/trino-taxi-data.adoc b/docs/modules/ROOT/pages/demos/trino-taxi-data.adoc