Skip to content

Commit 670c737

Browse files
fhennigmaltesander
andauthored
Docs new landing page (#385)
* remove unneccessary text * New intro text * Added getting started blurp * Added demos * Added operator model * fixed typo * Update docs/modules/hdfs/pages/index.adoc Co-authored-by: Malte Sander <[email protected]> * Update docs/modules/hdfs/pages/index.adoc Co-authored-by: Malte Sander <[email protected]> --------- Co-authored-by: Malte Sander <[email protected]>
1 parent 8b278ba commit 670c737

File tree

2 files changed

+30
-12
lines changed

2 files changed

+30
-12
lines changed

docs/modules/hdfs/images/hdfs_overview.drawio.svg

Lines changed: 4 additions & 0 deletions
Loading

docs/modules/hdfs/pages/index.adoc

Lines changed: 26 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -1,18 +1,25 @@
11
= Stackable Operator for Apache HDFS
2+
:description: The Stackable Operator for Apache HDFS is a Kubernetes operator that can manage Apache HDFS clusters. Learn about its features, resources, dependencies and demos, and see the list of supported HDFS versions.
3+
:keywords: Stackable Operator, Hadoop, Apache HDFS, Kubernetes, k8s, operator, engineer, big data, metadata, storage, cluster, distributed storage
24

3-
The Stackable Operator for https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HdfsUserGuide.html[Apache HDFS] is used to set up HFDS in high-availability mode. It depends on the xref:zookeeper:ROOT:index.adoc[] to operate a ZooKeeper cluster to coordinate the active and standby NameNodes.
5+
The Stackable Operator for https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HdfsUserGuide.html[Apache HDFS] (Hadoop Distributed File System) is used to set up HFDS in high-availability mode. HDFS is a distributed file system designed to store and manage massive amounts of data across multiple machines in a fault-tolerant manner. The Operator depends on the xref:zookeeper:index.adoc[] to operate a ZooKeeper cluster to coordinate the active and standby NameNodes.
46

5-
NOTE: This operator only works with images from the https://repo.stackable.tech/#browse/browse:docker:v2%2Fstackable%2Fhadoop[Stackable] repository
7+
== Getting started
68

7-
== Roles
9+
Follow the xref:getting_started/index.adoc[Getting started guide] which will guide you through installing the Stackable HDFS and ZooKeeper Operators, setting up ZooKeeper and HDFS and writing a file to HDFS to verify that everything is set up correctly.
810

9-
Three xref:home:concepts:roles-and-role-groups.adoc[roles] of the HDFS cluster are implemented:
11+
Afterwards you can consult the xref:usage-guide/index.adoc[] to learn more about tailoring your HDFS configuration to your needs, or have a look at the <<demos, demos>> for some example setups.
12+
13+
== Operator model
14+
15+
The Operator manages the _HdfsCluster_ custom resource. The cluster implements three xref:home:concepts:roles-and-role-groups.adoc[roles]:
1016

1117
* DataNode - responsible for storing the actual data.
1218
* JournalNode - responsible for keeping track of HDFS blocks and used to perform failovers in case the active NameNode fails. For details see: https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithQJM.html
1319
* NameNode - responsible for keeping track of HDFS blocks and providing access to the data.
1420

15-
== Kubernetes objects
21+
22+
image::hdfs_overview.drawio.svg[A diagram depicting the Kubernetes resources created by the Stackable Operator for Apache HDFS]
1623

1724
The operator creates the following K8S objects per role group defined in the custom resource.
1825

@@ -28,15 +35,22 @@ In the custom resource you can specify the number of replicas per role group (Na
2835
* 1 JournalNode
2936
* 1 DataNode (should match at least the `clusterConfig.dfsReplication` factor)
3037

38+
The Operator creates a xref:concepts:service_discovery.adoc[service discovery ConfigMap] for the HDFS instance. The discovery ConfigMap contains the `core-site.xml` file and the `hdfs-site.xml` file.
39+
40+
== Dependencies
41+
42+
HDFS depends on ZooKeeper for coordination between nodes. You can run a ZooKeeper cluster with the xref:zookeeper:index.adoc[]. Additionally, the xref:commons-operator:index.adoc[] and xref:secret-operator:index.adoc[] are needed.
43+
44+
== [[demos]]Demos
45+
46+
Two demos that use HDFS are available.
47+
48+
**xref:stackablectl::demos/hbase-hdfs-load-cycling-data.adoc[]** loads a dataset of cycling data from S3 into HDFS and then uses HBase to analyze the data.
49+
50+
**xref:stackablectl::demos/jupyterhub-pyspark-hdfs-anomaly-detection-taxi-data.adoc[]** showcases the integration between HDFS and Jupyter. New York Taxi data is stored in HDFS and analyzed in a Jupyter notebook.
51+
3152
== Supported Versions
3253

3354
The Stackable Operator for Apache HDFS currently supports the following versions of HDFS:
3455

3556
include::partial$supported-versions.adoc[]
36-
37-
== Docker image
38-
39-
[source]
40-
----
41-
docker pull docker.stackable.tech/stackable/hadoop:<version>
42-
----

0 commit comments

Comments
 (0)