Skip to content

Commit 6785b53

Browse files
authored
feat: Added Map Reduce Design Pattern (#3184)
* MapReduce design pattern added * Updated README.md * added module to parent pom
1 parent e17f138 commit 6785b53

14 files changed

+815
-0
lines changed

Diff for: map-reduce/README.md

+231
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,231 @@
1+
---
2+
title: "MapReduce Pattern in Java"
3+
shortTitle: MapReduce
4+
description: "Learn the MapReduce pattern in Java with real-world examples, class diagrams, and tutorials. Understand its intent, applicability, benefits, and known uses to enhance your design pattern knowledge."
5+
category: Performance optimization
6+
language: en
7+
tag:
8+
- Data processing
9+
- Code simplification
10+
- Delegation
11+
- Performance
12+
---
13+
14+
## Also known as
15+
16+
* Split-Apply-Combine Strategy
17+
* Scatter-Gather Pattern
18+
19+
## Intent of Map Reduce Design Pattern
20+
21+
MapReduce aims to process and generate large datasets with a parallel, distributed algorithm on a cluster. It divides the workload into two main phases: Map and Reduce, allowing for efficient parallel processing of data.
22+
23+
## Detailed Explanation of Map Reduce Pattern with Real-World Examples
24+
25+
Real-world example
26+
27+
> Imagine a large e-commerce company that wants to analyze its sales data across multiple regions. They have terabytes of transaction data stored across hundreds of servers. Using MapReduce, they can efficiently process this data to calculate total sales by product category. The Map function would process individual sales records, emitting key-value pairs of (category, sale amount). The Reduce function would then sum up all sale amounts for each category, producing the final result.
28+
29+
In plain words
30+
31+
> MapReduce splits a large problem into smaller parts, processes them in parallel, and then combines the results.
32+
33+
Wikipedia says
34+
35+
> "MapReduce is a programming model and associated implementation for processing and generating big data sets with a parallel, distributed algorithm on a cluster".
36+
MapReduce consists of two main steps:
37+
The "Map" step: The master node takes the input, divides it into smaller sub-problems, and distributes them to worker nodes. A worker node may do this again in turn, leading to a multi-level tree structure. The worker node processes the smaller problem, and passes the answer back to its master node.
38+
The "Reduce" step: The master node then collects the answers to all the sub-problems and combines them in some way to form the output – the answer to the problem it was originally trying to solve.
39+
This approach allows for efficient processing of vast amounts of data across multiple machines, making it a fundamental technique in big data analytics and distributed computing.
40+
41+
## Programmatic Example of Map Reduce in Java
42+
43+
### 1. Map Phase (Splitting & Processing Data)
44+
45+
* The Mapper takes an input string, splits it into words, and counts occurrences.
46+
* Output: A map {word → count} for each input line.
47+
#### `Mapper.java`
48+
```java
49+
public class Mapper {
50+
public static Map<String, Integer> map(String input) {
51+
Map<String, Integer> wordCount = new HashMap<>();
52+
String[] words = input.split("\\s+");
53+
for (String word : words) {
54+
word = word.toLowerCase().replaceAll("[^a-z]", "");
55+
if (!word.isEmpty()) {
56+
wordCount.put(word, wordCount.getOrDefault(word, 0) + 1);
57+
}
58+
}
59+
return wordCount;
60+
}
61+
}
62+
```
63+
Example Input: ```"Hello world hello"```
64+
Output: ```{hello=2, world=1}```
65+
66+
### 2. Shuffle Phase (Grouping Data by Key)
67+
68+
* The Shuffler collects key-value pairs from multiple mappers and groups values by key.
69+
#### `Shuffler.java`
70+
```java
71+
public class Shuffler {
72+
public static Map<String, List<Integer>> shuffleAndSort(List<Map<String, Integer>> mapped) {
73+
Map<String, List<Integer>> grouped = new HashMap<>();
74+
for (Map<String, Integer> map : mapped) {
75+
for (Map.Entry<String, Integer> entry : map.entrySet()) {
76+
grouped.putIfAbsent(entry.getKey(), new ArrayList<>());
77+
grouped.get(entry.getKey()).add(entry.getValue());
78+
}
79+
}
80+
return grouped;
81+
}
82+
}
83+
```
84+
Example Input:
85+
```
86+
[
87+
{"hello": 2, "world": 1},
88+
{"hello": 1, "java": 1}
89+
]
90+
```
91+
Output:
92+
```
93+
{
94+
"hello": [2, 1],
95+
"world": [1],
96+
"java": [1]
97+
}
98+
```
99+
100+
### 3. Reduce Phase (Aggregating Results)
101+
102+
* The Reducer sums up occurrences of each word.
103+
#### `Reducer.java`
104+
```java
105+
public class Reducer {
106+
public static List<Map.Entry<String, Integer>> reduce(Map<String, List<Integer>> grouped) {
107+
Map<String, Integer> reduced = new HashMap<>();
108+
for (Map.Entry<String, List<Integer>> entry : grouped.entrySet()) {
109+
reduced.put(entry.getKey(), entry.getValue().stream().mapToInt(Integer::intValue).sum());
110+
}
111+
112+
List<Map.Entry<String, Integer>> result = new ArrayList<>(reduced.entrySet());
113+
result.sort(Map.Entry.comparingByValue(Comparator.reverseOrder()));
114+
return result;
115+
}
116+
}
117+
```
118+
Example Input:
119+
```
120+
{
121+
"hello": [2, 1],
122+
"world": [1],
123+
"java": [1]
124+
}
125+
```
126+
Output:
127+
```
128+
[
129+
{"hello": 3},
130+
{"world": 1},
131+
{"java": 1}
132+
]
133+
```
134+
135+
### 4. Running the Full MapReduce Process
136+
137+
* The MapReduce class coordinates the three steps.
138+
#### `MapReduce.java`
139+
```java
140+
public class MapReduce {
141+
public static List<Map.Entry<String, Integer>> mapReduce(List<String> inputs) {
142+
List<Map<String, Integer>> mapped = new ArrayList<>();
143+
for (String input : inputs) {
144+
mapped.add(Mapper.map(input));
145+
}
146+
147+
Map<String, List<Integer>> grouped = Shuffler.shuffleAndSort(mapped);
148+
149+
return Reducer.reduce(grouped);
150+
}
151+
}
152+
```
153+
154+
### 4. Main Execution (Calling MapReduce)
155+
156+
* The Main class executes the MapReduce pipeline and prints the final word count.
157+
#### `Main.java`
158+
```java
159+
public static void main(String[] args) {
160+
List<String> inputs = Arrays.asList(
161+
"Hello world hello",
162+
"MapReduce is fun",
163+
"Hello from the other side",
164+
"Hello world"
165+
);
166+
List<Map.Entry<String, Integer>> result = MapReduce.mapReduce(inputs);
167+
for (Map.Entry<String, Integer> entry : result) {
168+
System.out.println(entry.getKey() + ": " + entry.getValue());
169+
}
170+
}
171+
```
172+
173+
Output:
174+
```
175+
hello: 4
176+
world: 2
177+
the: 1
178+
other: 1
179+
side: 1
180+
mapreduce: 1
181+
is: 1
182+
from: 1
183+
fun: 1
184+
```
185+
186+
## When to Use the Map Reduce Pattern in Java
187+
188+
Use MapReduce when:
189+
* Processing large datasets that don't fit into a single machine's memory
190+
* Performing computations that can be parallelized
191+
* Dealing with fault-tolerant and distributed computing scenarios
192+
* Analyzing log files, web crawl data, or scientific data
193+
194+
## Map Reduce Pattern Java Tutorials
195+
196+
* [MapReduce Tutorial(Apache Hadoop)](https://hadoop.apache.org/docs/stable/hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapReduceTutorial.html)
197+
* [MapReduce Example(Simplilearn)](https://www.youtube.com/watch?v=l2clwKnrtO8)
198+
199+
## Benefits and Trade-offs of Map Reduce Pattern
200+
201+
Benefits:
202+
203+
* Scalability: Can process vast amounts of data across multiple machines
204+
* Fault-tolerance: Handles machine failures gracefully
205+
* Simplicity: Abstracts complex distributed computing details
206+
207+
Trade-offs:
208+
209+
* Overhead: Not efficient for small datasets due to setup and coordination costs
210+
* Limited flexibility: Not suitable for all types of computations or algorithms
211+
* Latency: Batch-oriented nature may not be suitable for real-time processing needs
212+
213+
## Real-World Applications of Map Reduce Pattern in Java
214+
215+
* Google's original implementation for indexing web pages
216+
* Hadoop MapReduce for big data processing
217+
* Log analysis in large-scale systems
218+
* Genomic sequence analysis in bioinformatics
219+
220+
## Related Java Design Patterns
221+
222+
* Chaining Pattern
223+
* Master-Worker Pattern
224+
* Pipeline Pattern
225+
226+
## References and Credits
227+
228+
* [What is MapReduce](https://www.ibm.com/think/topics/mapreduce)
229+
* [Wy MapReduce is not dead](https://www.codemotion.com/magazine/ai-ml/big-data/mapreduce-not-dead-heres-why-its-still-ruling-in-the-cloud/)
230+
* [Scalabe Distributed Data Processing Solutions](https://tcpp.cs.gsu.edu/curriculum/?q=system%2Ffiles%2Fch07.pdf)
231+
* [Java Design Patterns: A Hands-On Experience with Real-World Examples](https://amzn.to/3HWNf4U)

Diff for: map-reduce/etc/map-reduce.png

28.6 KB
Loading

Diff for: map-reduce/etc/map-reduce.urm.puml

+24
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,24 @@
1+
@startuml
2+
package com.iluwatar {
3+
class Main {
4+
+ Main()
5+
+ main(args : String[]) {static}
6+
}
7+
class MapReduce {
8+
+ MapReduce()
9+
+ mapReduce(inputs : List<String>) : List<Map.Entry<String, Integer>> {static}
10+
}
11+
class Mapper {
12+
+ Mapper()
13+
+ map(input : String) : Map<String, Integer> {static}
14+
}
15+
class Reducer {
16+
+ Reducer()
17+
+ reduce(grouped : Map<String, List<Integer>>) : List<Map.Entry<String, Integer>> {static}
18+
}
19+
class Shuffler {
20+
+ Shuffler()
21+
+ shuffleAndSort(mapped : List<Map<String, Integer>>) : Map<String, List<Integer>> {static}
22+
}
23+
}
24+
@enduml

Diff for: map-reduce/pom.xml

+62
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,62 @@
1+
<?xml version="1.0" encoding="UTF-8"?>
2+
<!--
3+
4+
This project is licensed under the MIT license. Module model-view-viewmodel is using ZK framework licensed under LGPL (see lgpl-3.0.txt).
5+
6+
The MIT License
7+
Copyright © 2014-2022 Ilkka Seppälä
8+
9+
Permission is hereby granted, free of charge, to any person obtaining a copy
10+
of this software and associated documentation files (the "Software"), to deal
11+
in the Software without restriction, including without limitation the rights
12+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
13+
copies of the Software, and to permit persons to whom the Software is
14+
furnished to do so, subject to the following conditions:
15+
16+
The above copyright notice and this permission notice shall be included in
17+
all copies or substantial portions of the Software.
18+
19+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
20+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
21+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
22+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
23+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
24+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
25+
THE SOFTWARE.
26+
27+
-->
28+
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
29+
<modelVersion>4.0.0</modelVersion>
30+
<parent>
31+
<groupId>com.iluwatar</groupId>
32+
<artifactId>java-design-patterns</artifactId>
33+
<version>1.26.0-SNAPSHOT</version>
34+
</parent>
35+
<artifactId>map-reduce</artifactId>
36+
<dependencies>
37+
<dependency>
38+
<groupId>org.junit.jupiter</groupId>
39+
<artifactId>junit-jupiter-engine</artifactId>
40+
<scope>test</scope>
41+
</dependency>
42+
</dependencies>
43+
<build>
44+
<plugins>
45+
<plugin>
46+
<groupId>org.apache.maven.plugins</groupId>
47+
<artifactId>maven-assembly-plugin</artifactId>
48+
<executions>
49+
<execution>
50+
<configuration>
51+
<archive>
52+
<manifest>
53+
<mainClass>com.iluwatar.mapreduce.Main</mainClass>
54+
</manifest>
55+
</archive>
56+
</configuration>
57+
</execution>
58+
</executions>
59+
</plugin>
60+
</plugins>
61+
</build>
62+
</project>

Diff for: map-reduce/src/main/java/com/iluwatar/Main.java

+55
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,55 @@
1+
/*
2+
* This project is licensed under the MIT license. Module model-view-viewmodel is using ZK framework licensed under LGPL (see lgpl-3.0.txt).
3+
*
4+
* The MIT License
5+
* Copyright © 2014-2022 Ilkka Seppälä
6+
*
7+
* Permission is hereby granted, free of charge, to any person obtaining a copy
8+
* of this software and associated documentation files (the "Software"), to deal
9+
* in the Software without restriction, including without limitation the rights
10+
* to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
11+
* copies of the Software, and to permit persons to whom the Software is
12+
* furnished to do so, subject to the following conditions:
13+
*
14+
* The above copyright notice and this permission notice shall be included in
15+
* all copies or substantial portions of the Software.
16+
*
17+
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
18+
* IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
19+
* FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
20+
* AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
21+
* LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
22+
* OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
23+
* THE SOFTWARE.
24+
*/
25+
package com.iluwatar;
26+
27+
import java.util.Arrays;
28+
import java.util.List;
29+
import java.util.Map;
30+
import java.util.logging.Logger;
31+
32+
/**
33+
* The Main class serves as the entry point for executing the MapReduce program.
34+
* It processes a list of text inputs, applies the MapReduce pattern, and prints the results.
35+
*/
36+
public class Main {
37+
private static final Logger logger = Logger.getLogger(Main.class.getName());
38+
/**
39+
* The main method initiates the MapReduce process and displays the word count results.
40+
*
41+
* @param args Command-line arguments (not used).
42+
*/
43+
public static void main(String[] args) {
44+
List<String> inputs = Arrays.asList(
45+
"Hello world hello",
46+
"MapReduce is fun",
47+
"Hello from the other side",
48+
"Hello world"
49+
);
50+
List<Map.Entry<String, Integer>> result = MapReduce.mapReduce(inputs);
51+
for (Map.Entry<String, Integer> entry : result) {
52+
logger.info(entry.getKey() + ": " + entry.getValue());
53+
}
54+
}
55+
}

0 commit comments

Comments
 (0)