Skip to content

Commit f258746

Browse files
authored
Chore: Fix some warnings and add primer to README.md (#63)
* Chore: Fix some more warnings * Chore: Add link to individual cuDNN docs for each wrapped item and function * Chore: Add primer to README.md * Fix README.md * Fix: Broken docs links
1 parent c67efb7 commit f258746

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

46 files changed

+430
-95
lines changed

crates/cudnn/README.md

Lines changed: 86 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,88 @@
11
# cudnn
22
Type safe cuDNN wrapper for the Rust programming language.
3+
4+
## Project status
5+
The current version of cuDNN targeted by this wrapper is the 8.3.2. You can refer to the official [release notes](https://docs.nvidia.com/deeplearning/cudnn/release-notes/index.html) and to the [support matrix](https://docs.nvidia.com/deeplearning/cudnn/support-matrix/index.html) by NVIDIA.
6+
7+
The legacy API is somewhat complete and it is usable but the backend API is still to be considered a work in progress and its usage is therefore much discouraged. Both APIs are still being developed so expect bugs and reasonable breaking changes whilst using this crate.
8+
9+
The project is part of the Rust CUDA ecosystem and is actively maintained by [frjnn](https://github.com/frjnn).
10+
11+
## Primer
12+
13+
Here follows a list of useful concepts that should be taken as a handbook for the users of the crate. This is not intended to be the full documentation, as each wrapped struct, enum and function has its own docs, but rather a quick sum up of the key points of the API. As a matter of fact, for a deeper view, you should refer both to the docs of each item and to the [official ones](https://docs.nvidia.com/deeplearning/cudnn/api/index.html#overview) by NVIDIA. Furthermore, if you are new to cuDNN we strongly suggest reading the [official developer guide](https://docs.nvidia.com/deeplearning/cudnn/developer-guide/index.html#overview).
14+
15+
### Device buffers
16+
17+
This crate is built around [`cust`](https://docs.rs/cust/latest/cust/index.html) which is the core wrapper for interfacing with the CUDA driver API of our choice.
18+
19+
### cuDNN statuses and Result
20+
21+
All cuDNN library functions return their status. This crate uses [`Result`](https://doc.rust-lang.org/std/result/enum.Result.html) to achieve a leaner, idiomatic and easier to manage API.
22+
23+
### cuDNN handles and RAII
24+
25+
The main entry point of the cuDNN library is the `CudnnContext` struct. This handle is tied to a device and it is explicitly passed to every subsequent library function that operates on GPU data. It manages resources allocations both on the host and the device and takes care of the synchronization of all the the cuDNN primitives.
26+
27+
The handles, and the other cuDNN structs wrapped by this crate, are implementors of the [`Drop`](https://doc.rust-lang.org/std/ops/trait.Drop.html) trait which implicitly calls their destructors on the cuDNN side when they go out of scope.
28+
29+
cuDNN contexts can be created as shown in the following snippet:
30+
31+
```rust
32+
use cudnn::CudnnContext;
33+
34+
let ctx = CudnnContext::new().unwrap();
35+
```
36+
37+
### cuDNN data types
38+
39+
In order to enforce type safety as much as possible at compile time, we shifted away from the original cuDNN enumerated data types and instead opted to leverage Rust's generics. In practice, this means that specifying the data type of a cuDNN tensor descriptor is done as follows:
40+
41+
```rust
42+
use cudnn::{CudnnContext, TensorDescriptor};
43+
44+
let ctx = CudnnContext::new().unwrap();
45+
46+
let shape = &[5, 5, 10, 25];
47+
let strides = &[1250, 250, 25, 1];
48+
49+
// f32 tensor
50+
let desc = TensorDescriptor::<f32>::new_strides(shape, strides).unwrap();
51+
```
52+
53+
This API also allows for using Rust own types as cuDNN data types, which we see as a desirable property.
54+
55+
Safely manipulating cuDNN data types that do not have any such direct match, such as vectorized ones, whilst still performing compile time compatibility checks can be done as follows:
56+
57+
```rust
58+
use cudnn::{CudnnContext, TensorDescriptor, Vec4};
59+
60+
let ctx = CudnnContext::new().unwrap();
61+
62+
let shape = &[4, 32, 32, 32];
63+
64+
// in cuDNN this is equal to the INT8x4 data type and CUDNN_TENSOR_NCHW_VECT_C format
65+
let desc = TensorDescriptor::<i8>::new_vectorized::<Vec4>(shape).unwrap();
66+
```
67+
68+
The previous tensor descriptor can be used together with a `i8` device buffer and cuDNN will see it as being a tensor of `CUDNN_TENSOR_NCHW_VECT_C` format and `CUDNN_DATA_INT8x4` data type.
69+
70+
Currently this crate does not support `f16` and `bf16` data types.
71+
72+
### Tensor formats
73+
74+
We decided not to check tensor format configurations at compile time, since it is too strong of a requirement. As a consequence, should you mess up, the program will fail at run-time. A proper understanding of the cuDNN API mechanics is thus fundamental to properly use this crate.
75+
76+
You can refer to this [extract](https://docs.nvidia.com/deeplearning/cudnn/developer-guide/index.html#data-layout-formats) from the cuDNN developer guide to learn more about tensor formats.
77+
78+
We split the original cuDNN tensor format enum, which counts 3 variants, in 2 parts: the `ScalarC` enum and the `TensorFormat::NchwVectC` enum variant. The former stands for "scalar channel" and it encapsulates the `Nchw` and `Nhwc` formats. Scalar channel formats can be both converted to the `TensorFormat` enum with [`.into()`](https://doc.rust-lang.org/std/convert/trait.Into.html).
79+
80+
```rust
81+
use cudnn::{ScalarC, TensorFormat};
82+
83+
let sc_fmt = ScalarC::Nchw;
84+
85+
let vc_fmt = TensorFormat::NchwVectC;
86+
87+
let sc_to_tf: TensorFormat = sc_fmt.into();
88+
```

crates/cudnn/build.rs

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,4 @@
11
fn main() {
2-
println!("cargo:include=/usr/local/cuda/include");
32
println!("cargo:rustc-link-lib=dylib=cudnn");
43
println!("cargo:rerun-if-changed=build.rs");
54
}

crates/cudnn/src/activation/activation_descriptor.rs

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,9 @@ impl ActivationDescriptor {
1919
/// * `coefficient` - optional coefficient for the given function. It specifies the clipping
2020
/// threshold for `ActivationMode::ClippedRelu`.
2121
///
22+
/// cuDNN [docs](https://docs.nvidia.com/deeplearning/cudnn/api/index.html#cudnnSetActivationDescriptor)
23+
/// may offer additional information about the API behavior.
24+
///
2225
/// # Examples
2326
///
2427
/// ```

crates/cudnn/src/activation/activation_mode.rs

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,9 @@
11
use crate::sys;
22

33
/// Specifies a neuron activation function.
4+
///
5+
/// cuDNN [docs](https://docs.nvidia.com/deeplearning/cudnn/api/index.html#cudnnActivationMode_t)
6+
/// may offer additional information about the APi behavior.
47
#[derive(Debug, Clone, Copy, PartialEq, Eq, Hash)]
58
pub enum ActivationMode {
69
/// Selects the sigmoid function.
@@ -18,7 +21,7 @@ pub enum ActivationMode {
1821
/// Selects no activation.
1922
///
2023
/// **Do note** that this is only valid for an activation descriptor passed to
21-
/// [`convolution_bias_act_forward()`](CudnnContext::convolution_bias_act_fwd).
24+
/// [`convolution_bias_act_forward()`](crate::CudnnContext::convolution_bias_act_forward).
2225
Identity,
2326
}
2427

crates/cudnn/src/activation/mod.rs

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -28,6 +28,9 @@ impl CudnnContext {
2828
///
2929
/// * `y` - data for the output.
3030
///
31+
/// cuDNN [docs](https://docs.nvidia.com/deeplearning/cudnn/api/index.html#cudnnActivationForward)
32+
/// may offer additional information about the APi behavior.
33+
///
3134
/// # Errors
3235
///
3336
/// Returns errors if the shapes of the `y` and `x` tensors do not match or an unsupported
@@ -66,6 +69,7 @@ impl CudnnContext {
6669
/// # Ok(())
6770
/// # }
6871
/// ```
72+
#[allow(clippy::too_many_arguments)]
6973
pub fn activation_forward<CompT, T>(
7074
&self,
7175
activation_desc: &ActivationDescriptor,
@@ -127,11 +131,15 @@ impl CudnnContext {
127131
///
128132
/// * `dx` - data for the input differential.
129133
///
134+
/// cuDNN [docs](https://docs.nvidia.com/deeplearning/cudnn/api/index.html#cudnnActivationBackward)
135+
/// may offer additional information about the APi behavior.
136+
///
130137
/// # Errors
131138
///
132139
/// Returns errors if the shapes of the `dx` and `x` tensors do not match, the strides of the
133140
/// tensors and their differential do not match, or an unsupported configuration of arguments
134141
/// is detected.
142+
#[allow(clippy::too_many_arguments)]
135143
pub fn activation_backward<CompT, T>(
136144
&self,
137145
activation_desc: &ActivationDescriptor,

crates/cudnn/src/attention/attention_descriptor.rs

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -85,6 +85,9 @@ where
8585
///
8686
/// * `max_bream_size` - largest beam expected in any sequential data descriptor.
8787
///
88+
/// cuDNN [docs](https://docs.nvidia.com/deeplearning/cudnn/api/index.html#cudnnSetAttnDescriptor)
89+
/// may offer additional information about the APi behavior.
90+
///
8891
/// # Errors
8992
///
9093
/// Returns errors if an unsupported combination of arguments is detected. Some examples
@@ -100,7 +103,7 @@ where
100103
///
101104
/// * one or more of the following arguments were negative: `q_proj_size`, `k_proj_size`,
102105
/// `v_proj_size`, `sm_scaler`.
103-
///
106+
#[allow(clippy::too_many_arguments)]
104107
pub fn new(
105108
mode: AttnModeFlags,
106109
n_heads: i32,

crates/cudnn/src/attention/attention_weights_kind.rs

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,9 @@
11
use crate::sys;
22

33
/// Specifies a group of weights or biases for the multi-head attention layer.
4+
///
5+
/// cuDNN [docs](https://docs.nvidia.com/deeplearning/cudnn/api/index.html#cudnnMultiHeadAttnWeightKind_t)
6+
/// may offer additional information about the APi behavior.
47
#[derive(Debug, Clone, Copy, PartialEq, Eq, Hash)]
58
pub enum AttnWeight {
69
/// Selects the input projection weights for queries.

crates/cudnn/src/attention/mod.rs

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -26,6 +26,9 @@ impl CudnnContext {
2626
///
2727
/// `desc` - multi-head attention descriptor.
2828
///
29+
/// cuDNN [docs](https://docs.nvidia.com/deeplearning/cudnn/api/index.html#cudnnGetMultiHeadAttnBuffers)
30+
/// may offer additional information about the APi behavior.
31+
///
2932
/// # Errors
3033
///
3134
/// Returns errors if invalid arguments are detected.
@@ -111,6 +114,10 @@ impl CudnnContext {
111114
///
112115
/// * `reserve_space` - reserve space buffer in device memory. This argument should be `None` in
113116
/// inference mode.
117+
///
118+
/// cuDNN [docs](https://docs.nvidia.com/deeplearning/cudnn/api/index.html#cudnnMultiHeadAttnForward)
119+
/// may offer additional information about the APi behavior.
120+
#[allow(clippy::too_many_arguments)]
114121
pub fn multi_head_attn_forward<T, U, D1, D2>(
115122
&self,
116123
attn_desc: &AttentionDescriptor<T, U, D1, D2>,
@@ -251,11 +258,15 @@ impl CudnnContext {
251258
///
252259
/// * `reserve_space` - reserve space buffer in device memory.
253260
///
261+
/// cuDNN [docs](https://docs.nvidia.com/deeplearning/cudnn/api/index.html#cudnnMultiHeadAttnBackwardData)
262+
/// may offer additional information about the APi behavior.
263+
///
254264
/// # Errors
255265
///
256266
/// Returns errors if an invalid or incompatible input argument was encountered, an inconsistent
257267
/// internal state was encountered, a requested option or a combination of input arguments is
258268
/// not supported or in case of insufficient amount of shared memory to launch the kernel.
269+
#[allow(clippy::too_many_arguments)]
259270
pub fn multi_head_attn_backward_data<T, U, D1, D2>(
260271
&self,
261272
attn_desc: &AttentionDescriptor<T, U, D1, D2>,
@@ -382,11 +393,15 @@ impl CudnnContext {
382393
///
383394
/// * `reserve_space` - reserve space buffer in device memory.
384395
///
396+
/// cuDNN [docs](https://docs.nvidia.com/deeplearning/cudnn/api/index.html#cudnnMultiHeadAttnBackwardWeights)
397+
/// may offer additional information about the APi behavior.
398+
///
385399
/// # Errors
386400
///
387401
/// Returns errors if an invalid or incompatible input argument was encountered, an inconsistent
388402
/// internal state was encountered, a requested option or a combination of input arguments is
389403
/// not supported or in case of insufficient amount of shared memory to launch the kernel.
404+
#[allow(clippy::too_many_arguments)]
390405
pub fn multi_head_attn_backward_weights<T, U, D1, D2>(
391406
&self,
392407
attn_desc: &AttentionDescriptor<T, U, D1, D2>,

crates/cudnn/src/attention/seq_data_axis.rs

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,9 @@ use crate::sys;
33
/// Describes and indexes active dimensions in the `SeqDataDescriptor` `dim` field. This enum is
44
/// also used in the `axis` argument of the `SeqDataDescriptor` constructor to define the layout
55
/// of the sequence data buffer in memory.
6+
///
7+
/// cuDNN [docs](https://docs.nvidia.com/deeplearning/cudnn/api/index.html#cudnnSeqDataAxis_t)
8+
/// may offer additional information about the APi behavior.
69
#[derive(Debug, Clone, Copy, PartialEq, Eq, Hash)]
710
pub enum SeqDataAxis {
811
/// Identifies the time (sequence length) dimension or specifies the time in the data layout.

crates/cudnn/src/attention/seq_data_descriptor.rs

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -71,6 +71,9 @@ where
7171
///
7272
/// * `seq_lengths` - array that defines all sequence lengths of the underlying container.
7373
///
74+
/// cuDNN [docs](https://docs.nvidia.com/deeplearning/cudnn/api/index.html#cudnnSetSeqDataDescriptor)
75+
/// may offer additional information about the APi behavior.
76+
///
7477
/// # Errors
7578
///
7679
/// Returns errors if the innermost dimension as specified in the `axes` array is not
@@ -127,7 +130,7 @@ where
127130
sys::cudnnSetSeqDataDescriptor(
128131
raw,
129132
T::into_raw(),
130-
4 as i32,
133+
4_i32,
131134
dims.as_ptr(),
132135
raw_axes.as_ptr(),
133136
seq_lengths.len(),

crates/cudnn/src/context.rs

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,9 @@ pub struct CudnnContext {
3030
impl CudnnContext {
3131
/// Creates a new cuDNN context, allocating the required memory on both host and device.
3232
///
33+
/// cuDNN [docs](https://docs.nvidia.com/deeplearning/cudnn/api/index.html#cudnnCreate)
34+
/// may offer additional information about the APi behavior.
35+
///
3336
/// # Examples
3437
///
3538
/// ```
@@ -54,6 +57,9 @@ impl CudnnContext {
5457
}
5558

5659
/// Returns the version number of the underlying cuDNN library.
60+
///
61+
/// cuDNN [docs](https://docs.nvidia.com/deeplearning/cudnn/api/index.html#cudnnGetVersion)
62+
/// may offer additional information about the APi behavior.
5763
pub fn version(&self) -> (u32, u32, u32) {
5864
unsafe {
5965
// cudnnGetVersion does not return a state as it never fails.
@@ -69,6 +75,9 @@ impl CudnnContext {
6975
/// Since The same version of a given cuDNN library can be compiled against different CUDA
7076
/// toolkit versions, this routine returns the CUDA toolkit version that the currently used
7177
/// cuDNN library has been compiled against.
78+
///
79+
/// cuDNN [docs](https://docs.nvidia.com/deeplearning/cudnn/api/index.html#cudnnGetCudartVersion)
80+
/// may offer additional information about the APi behavior.
7281
pub fn cuda_version(&self) -> (u32, u32, u32) {
7382
unsafe {
7483
// cudnnGetCudartVersion does not return a state as it never fails.
@@ -94,6 +103,9 @@ impl CudnnContext {
94103
///
95104
/// `stream` - the CUDA stream to be written to the cuDNN handle.
96105
///
106+
/// cuDNN [docs](https://docs.nvidia.com/deeplearning/cudnn/api/index.html#cudnnSetStream)
107+
/// may offer additional information about the APi behavior.
108+
///
97109
/// # Errors
98110
///
99111
/// Returns error if the supplied stream in invalid or a mismatch if found between the user

crates/cudnn/src/convolution/convolution_algo.rs

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -63,7 +63,7 @@ pub enum ConvFwdAlgo {
6363
Fft,
6464
/// This algorithm uses the Fast-Fourier Transform approach but splits the inputs into tiles.
6565
/// A significant memory workspace is needed to store intermediate results, but less than
66-
/// [`Fft`], for large size images.
66+
/// [`ConvFwdAlgo::Fft`], for large size images.
6767
FftTiling,
6868
/// This algorithm uses the Winograd Transform approach to compute the convolution. A reasonably
6969
/// sized workspace is needed to store intermediate results.
@@ -158,7 +158,7 @@ pub enum ConvBwdDataAlgo {
158158
Fft,
159159
/// This algorithm uses the Fast-Fourier Transform approach but splits the inputs into tiles.
160160
/// A significant memory workspace is needed to store intermediate results, but less than
161-
/// [`Fft`], for large size images.
161+
/// [`ConvBwdDataAlgo::Fft`], for large size images.
162162
FftTiling,
163163
/// This algorithm uses the Winograd Transform approach to compute the convolution. A reasonably
164164
/// sized workspace is needed to store intermediate results.
@@ -240,7 +240,7 @@ pub enum ConvBwdFilterAlgo {
240240
Fft,
241241
/// This algorithm uses the Fast-Fourier Transform approach but splits the inputs into tiles.
242242
/// A significant memory workspace is needed to store intermediate results, but less than
243-
/// [`Fft`], for large size images.
243+
/// [`ConvBwdFilterAlgo::Fft`], for large size images.
244244
FftTiling,
245245
/// This algorithm uses the Winograd Transform approach to compute the convolution. A reasonably
246246
/// sized workspace is needed to store intermediate results.

crates/cudnn/src/convolution/convolution_config.rs

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,9 @@
11
use crate::{private, DataType};
22

33
/// Supported data types configurations for convolution operations.
4+
///
5+
/// cuDNN [docs](https://docs.nvidia.com/deeplearning/cudnn/api/index.html#cudnnConvolutionForward)
6+
/// may offer additional information about the APi behavior.
47
pub trait SupportedConv<X, W, Y>: private::Sealed + DataType
58
where
69
X: DataType,

crates/cudnn/src/convolution/convolution_descriptor.rs

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -36,6 +36,9 @@ impl<T: DataType> ConvDescriptor<T> {
3636
/// * `math_type` - indicates whether or not the use of tensor op is permitted in the library
3737
/// routines associated with a given convolution descriptor.
3838
///
39+
/// cuDNN [docs](https://docs.nvidia.com/deeplearning/cudnn/api/index.html#cudnnSetConvolutionNdDescriptor)
40+
/// may offer additional information about the APi behavior.
41+
///
3942
/// # Errors
4043
///
4144
/// This function returns an error if any element of stride and dilation is negative or 0, if
@@ -123,6 +126,9 @@ impl<T: DataType> ConvDescriptor<T> {
123126
///
124127
/// **Do note** that tensor core operations may not be available on all device architectures.
125128
///
129+
/// cuDNN [docs](https://docs.nvidia.com/deeplearning/cudnn/api/index.html#cudnnSetConvolutionMathType)
130+
/// may offer additional information about the APi behavior.
131+
///
126132
/// # Errors
127133
///
128134
/// Returns errors if the math type was not set successfully.
@@ -155,6 +161,9 @@ impl<T: DataType> ConvDescriptor<T> {
155161
///
156162
/// `groups` - group count.
157163
///
164+
/// cuDNN [docs](https://docs.nvidia.com/deeplearning/cudnn/api/index.html#cudnnSetConvolutionGroupCount)
165+
/// may offer additional information about the APi behavior.
166+
///
158167
/// # Errors
159168
///
160169
/// Returns errors if the argument passed is invalid.

0 commit comments

Comments
 (0)