Skip to content

Commit ecc0ea1

Browse files
committed
link to spark connect client image
1 parent 7a83653 commit ecc0ea1

File tree

1 file changed

+3
-0
lines changed

1 file changed

+3
-0
lines changed

docs/modules/demos/pages/jupyterhub-pyspark-hdfs-anomaly-detection-taxi-data.adoc

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,7 @@
1212
:hadoop: https://hadoop.apache.org/
1313
:jupyter: https://jupyter.org
1414
:spark-connect: https://spark.apache.org/docs/latest/spark-connect-overview.html
15+
:spark-connect-client: https://github.com/stackabletech/docker-images/blob/main/spark-connect-client/Dockerfile
1516

1617
This demo showcases the integration between {jupyterlab}[JupyterLab], {spark-connect}[Spark Connect] and {hadoop}[Apache Hadoop] deployed on the Stackable Data Platform (SDP) Kubernetes cluster.
1718
The SDP makes this integration easy by publishing a discovery ConfigMap for the HDFS cluster and a Spark Connect service.
@@ -127,6 +128,8 @@ You can also inspect the `hdfs` folder where the `core-site.xml` and `hdfs-site.
127128
The Python notebook uses libraries such as `pandas` and `scikit-learn` to analyze the data.
128129
In addition, since the model training is delegated to a Spark Connect server, some of these dependencies, most notably `scikit-learn`, must also be made available on the Spark Connect pods.
129130
For convenience, a custom image is used in this demo that bundles all the required libraries for both the notebook and the Spark Connect server.
131+
The source of the image is available {spark-connect-client}[here].
132+
130133
In practice, clients of Spark Connect do not need a full-blown Spark installation available locally, but only the libraries that are used in the notebook.
131134

132135
== Model details

0 commit comments

Comments
 (0)