1
1
"""
2
2
- - - - - -- - - - - - - - - - - - - - - - - - - - - - -
3
3
Name - - sliding_window_attention.py
4
- Goal - - Implement a neural network architecture using sliding
4
+ Goal - - Implement a neural network architecture using sliding
5
5
window attention for sequence modeling tasks.
6
6
Detail: Total 5 layers neural network
7
7
* Input layer
12
12
13
13
Date: 2024.10.20
14
14
References:
15
- 1. Choromanska, A., et al. (2020). "On the Importance of
16
- Initialization and Momentum in Deep Learning." *Proceedings
15
+ 1. Choromanska, A., et al. (2020). "On the Importance of
16
+ Initialization and Momentum in Deep Learning." *Proceedings
17
17
of the 37th International Conference on Machine Learning*.
18
- 2. Dai, Z., et al. (2020). "Transformers are RNNs: Fast
19
- Autoregressive Transformers with Linear Attention."
18
+ 2. Dai, Z., et al. (2020). "Transformers are RNNs: Fast
19
+ Autoregressive Transformers with Linear Attention."
20
20
*arXiv preprint arXiv:2006.16236*.
21
21
3. [Attention Mechanisms in Neural Networks](https://en.wikipedia.org/wiki/Attention_(machine_learning))
22
22
- - - - - -- - - - - - - - - - - - - - - - - - - - - - -
28
28
class SlidingWindowAttention :
29
29
"""Sliding Window Attention Module.
30
30
31
- This class implements a sliding window attention mechanism where
31
+ This class implements a sliding window attention mechanism where
32
32
the model attends to a fixed-size window of context around each token.
33
33
34
34
Attributes:
@@ -54,13 +54,13 @@ def forward(self, input_tensor: np.ndarray) -> np.ndarray:
54
54
Forward pass for the sliding window attention.
55
55
56
56
Args:
57
- input_tensor (np.ndarray): Input tensor of shape (batch_size,
57
+ input_tensor (np.ndarray): Input tensor of shape (batch_size,
58
58
seq_length, embed_dim).
59
59
60
60
Returns:
61
61
np.ndarray: Output tensor of shape (batch_size, seq_length, embed_dim).
62
62
63
- >>> x = np.random.randn(2, 10, 4) # Batch size 2, sequence
63
+ >>> x = np.random.randn(2, 10, 4) # Batch size 2, sequence
64
64
>>> attention = SlidingWindowAttention(embed_dim=4, window_size=3)
65
65
>>> output = attention.forward(x)
66
66
>>> output.shape
@@ -95,7 +95,7 @@ def forward(self, input_tensor: np.ndarray) -> np.ndarray:
95
95
96
96
# usage
97
97
rng = np .random .default_rng ()
98
- x = rng .standard_normal ((2 , 10 , 4 )) # Batch size 2,
98
+ x = rng .standard_normal ((2 , 10 , 4 )) # Batch size 2,
99
99
attention = SlidingWindowAttention (embed_dim = 4 , window_size = 3 )
100
100
output = attention .forward (x )
101
101
print (output )
0 commit comments