Skip to content

Chore: Fix some warnings and add primer to README.md #63

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Mar 16, 2022
Merged
Show file tree
Hide file tree
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
82 changes: 82 additions & 0 deletions crates/cudnn/README.md
Original file line number Diff line number Diff line change
@@ -1,2 +1,84 @@
# cudnn
Type safe cuDNN wrapper for the Rust programming language.

## Project status
The current version of cuDNN targeted by this wrapper is the 8.3.2. You can refer to the official [release notes](https://docs.nvidia.com/deeplearning/cudnn/release-notes/index.html) and to the [support matrix](https://docs.nvidia.com/deeplearning/cudnn/support-matrix/index.html) by NVIDIA.

The legacy API is somewhat complete and it is usable but the backend API is still to be considered a work in progress and its usage is therefore much discouraged. Both APIs are still being developed so expect bugs and reasonable breaking changes whilst using this crate.

The project is part of the Rust CUDA ecosystem and is actively maintained by [frjnn](https://github.com/frjnn).

## Primer

Here follows a list of useful concepts that should be taken as a handbook for the users of the crate. This is not intended to be the full documentation, as each wrapped struct, enum and function has its own docs, but rather a quick sum up of the key points of the API. As a matter of fact, for a deeper view, you should refer both to the docs of each item and to the [official ones](https://docs.nvidia.com/deeplearning/cudnn/api/index.html#overview) by NVIDIA. Furthermore, if you are new to cuDNN we strongly suggest reading the [official developer guide](https://docs.nvidia.com/deeplearning/cudnn/developer-guide/index.html#overview).

### Device buffers

This crate is built around [`cust`](https://docs.rs/cust/latest/cust/memory/index.html) own memory routines and transfer functions.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should prob just be "around cust" in general, maybe add "... which is our core wrapper for interfacing with the CUDA Driver API"


### cuDNN statuses and Result

All cuDNN library functions return their status. This crate uses [`Result`](https://doc.rust-lang.org/std/result/enum.Result.html) to achieve a leaner, idiomatic and easier to manage API.

### cuDNN handles and RAII

The main entry point of the cuDNN library is the `CudnnContext` struct. This handle is tied to a device and it is explicitly passed to every subsequent library function that operates on GPU data. It manages resources allocations both on the host and the device and takes care of the synchronization of all the the cuDNN primitives.

The handles, and the other cuDNN structs wrapped by this crate, are implementors of the [`Drop`](https://doc.rust-lang.org/std/ops/trait.Drop.html) trait which implicitly calls their destructors on the cuDNN side when they go out of scope.

cuDNN contexts can be created as shown in the following snippet:

```rust
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cudnn requires u to init first, no? at least cust does, maybe include initialization in this snippet?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the snippet runs fine, but I will add the ctx init, moreover many cudnn structs can be initialised without a ctx, although the docs specify otherwise

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh wonderful

use cudnn::CudnnContext;

let ctx = CudnnContext::new().unwrap();
```

### cuDNN data types

In order to enforce type safety as much as possible at compile time, we shifted away from the original cuDNN enumerated data types and instead opted to leverage Rust's generics. In practice, this means that specifying the data type of a cuDNN tensor descriptor is done as follows:

```rust
use cudnn::TensorDescriptor;

let shape = &[5, 5, 10, 25];
let strides = &[1250, 250, 25, 1];

// f32 tensor
let desc = TensorDescriptor::<f32>::new_strides(shape, strides).unwrap();
```

This API also allows for using Rust own types as cuDNN data types, which we see as a desirable property.

Safely manipulating cuDNN data types that do not have any such direct match, such as vectorized ones, whilst still performing compile time compatibility checks can be done as follows:

```rust
use cudnn::{TensorDescriptor, Vec4};

let shape = &[4, 32, 32, 32];

// in cuDNN this is equal to the INT8x4 data type and CUDNN_TENSOR_NCHW_VECT_C format
let desc = TensorDescriptor::<i8>::new_vectorized::<Vec4>(shape).unwrap();
```

The previous tensor descriptor can be used together with a `i8` device buffer and cuDNN will see it as being a tensor of `CUDNN_TENSOR_NCHW_VECT_C` format and `CUDNN_DATA_INT8x4` data type.

Currently this crate does not support `f16` and `bf16` data types.

### Tensor formats

We decided not to check tensor format configurations at compile time, since it is too strong of a requirement. As a consequence, should you mess up, the program will fail at run-time. A proper understanding of the cuDNN API mechanics is thus fundamental to properly use this crate.

You can refer to this [extract](https://docs.nvidia.com/deeplearning/cudnn/developer-guide/index.html#data-layout-formats) from the cuDNN developer guide to learn more about tensor formats.

We split the original cuDNN tensor format enum, which counts 3 variants, in 2 parts: the `ScalarC` enum and the `TensorFormat::NchwVectC` enum variant. The former stands for "scalar channel" and it encapsulates the `Nchw` and `Nhwc` formats. Scalar channel formats can be both converted to the `TensorFormat` enum with [`.into()`](https://doc.rust-lang.org/std/convert/trait.Into.html).

```rust
use cudnn::{TensorFormat, ScalarC};

let sc_fmt = ScalarC::Nchw;

let vc_fmt = TensorFormat::NchwVectC;

let sc_to_tf: TensorFormat = sc_fmt.into();
```
1 change: 0 additions & 1 deletion crates/cudnn/build.rs
Original file line number Diff line number Diff line change
@@ -1,5 +1,4 @@
fn main() {
println!("cargo:include=/usr/local/cuda/include");
println!("cargo:rustc-link-lib=dylib=cudnn");
println!("cargo:rerun-if-changed=build.rs");
}
3 changes: 3 additions & 0 deletions crates/cudnn/src/activation/activation_descriptor.rs
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,9 @@ impl ActivationDescriptor {
/// * `coefficient` - optional coefficient for the given function. It specifies the clipping
/// threshold for `ActivationMode::ClippedRelu`.
///
/// cuDNN [docs](https://docs.nvidia.com/deeplearning/cudnn/api/index.html#cudnnSetActivationDescriptor)
/// may offer additional information about the API behavior.
///
/// # Examples
///
/// ```
Expand Down
3 changes: 3 additions & 0 deletions crates/cudnn/src/activation/activation_mode.rs
Original file line number Diff line number Diff line change
@@ -1,6 +1,9 @@
use crate::sys;

/// Specifies a neuron activation function.
///
/// cuDNN [docs](https://docs.nvidia.com/deeplearning/cudnn/api/index.html#cudnnActivationMode_t)
/// may offer additional information about the APi behavior.
#[derive(Debug, Clone, Copy, PartialEq, Eq, Hash)]
pub enum ActivationMode {
/// Selects the sigmoid function.
Expand Down
9 changes: 9 additions & 0 deletions crates/cudnn/src/activation/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,10 @@ impl CudnnContext {
///
/// * `y` - data for the output.
///
///
/// cuDNN [docs](https://docs.nvidia.com/deeplearning/cudnn/api/index.html#cudnnActivationForward)
/// may offer additional information about the APi behavior.
///
/// # Errors
///
/// Returns errors if the shapes of the `y` and `x` tensors do not match or an unsupported
Expand Down Expand Up @@ -66,6 +70,7 @@ impl CudnnContext {
/// # Ok(())
/// # }
/// ```
#[allow(clippy::too_many_arguments)]
pub fn activation_forward<CompT, T>(
&self,
activation_desc: &ActivationDescriptor,
Expand Down Expand Up @@ -127,11 +132,15 @@ impl CudnnContext {
///
/// * `dx` - data for the input differential.
///
/// cuDNN [docs](https://docs.nvidia.com/deeplearning/cudnn/api/index.html#cudnnActivationBackward)
/// may offer additional information about the APi behavior.
///
/// # Errors
///
/// Returns errors if the shapes of the `dx` and `x` tensors do not match, the strides of the
/// tensors and their differential do not match, or an unsupported configuration of arguments
/// is detected.
#[allow(clippy::too_many_arguments)]
pub fn activation_backward<CompT, T>(
&self,
activation_desc: &ActivationDescriptor,
Expand Down
5 changes: 4 additions & 1 deletion crates/cudnn/src/attention/attention_descriptor.rs
Original file line number Diff line number Diff line change
Expand Up @@ -85,6 +85,9 @@ where
///
/// * `max_bream_size` - largest beam expected in any sequential data descriptor.
///
/// cuDNN [docs](https://docs.nvidia.com/deeplearning/cudnn/api/index.html#cudnnSetAttnDescriptor)
/// may offer additional information about the APi behavior.
///
/// # Errors
///
/// Returns errors if an unsupported combination of arguments is detected. Some examples
Expand All @@ -100,7 +103,7 @@ where
///
/// * one or more of the following arguments were negative: `q_proj_size`, `k_proj_size`,
/// `v_proj_size`, `sm_scaler`.
///
#[allow(clippy::too_many_arguments)]
pub fn new(
mode: AttnModeFlags,
n_heads: i32,
Expand Down
3 changes: 3 additions & 0 deletions crates/cudnn/src/attention/attention_weights_kind.rs
Original file line number Diff line number Diff line change
@@ -1,6 +1,9 @@
use crate::sys;

/// Specifies a group of weights or biases for the multi-head attention layer.
///
/// cuDNN [docs](https://docs.nvidia.com/deeplearning/cudnn/api/index.html#cudnnMultiHeadAttnWeightKind_t)
/// may offer additional information about the APi behavior.
#[derive(Debug, Clone, Copy, PartialEq, Eq, Hash)]
pub enum AttnWeight {
/// Selects the input projection weights for queries.
Expand Down
15 changes: 15 additions & 0 deletions crates/cudnn/src/attention/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,9 @@ impl CudnnContext {
///
/// `desc` - multi-head attention descriptor.
///
/// cuDNN [docs](https://docs.nvidia.com/deeplearning/cudnn/api/index.html#cudnnGetMultiHeadAttnBuffers)
/// may offer additional information about the APi behavior.
///
/// # Errors
///
/// Returns errors if invalid arguments are detected.
Expand Down Expand Up @@ -111,6 +114,10 @@ impl CudnnContext {
///
/// * `reserve_space` - reserve space buffer in device memory. This argument should be `None` in
/// inference mode.
///
/// cuDNN [docs](https://docs.nvidia.com/deeplearning/cudnn/api/index.html#cudnnMultiHeadAttnForward)
/// may offer additional information about the APi behavior.
#[allow(clippy::too_many_arguments)]
pub fn multi_head_attn_forward<T, U, D1, D2>(
&self,
attn_desc: &AttentionDescriptor<T, U, D1, D2>,
Expand Down Expand Up @@ -251,11 +258,15 @@ impl CudnnContext {
///
/// * `reserve_space` - reserve space buffer in device memory.
///
/// cuDNN [docs](https://docs.nvidia.com/deeplearning/cudnn/api/index.html#cudnnMultiHeadAttnBackwardData)
/// may offer additional information about the APi behavior.
///
/// # Errors
///
/// Returns errors if an invalid or incompatible input argument was encountered, an inconsistent
/// internal state was encountered, a requested option or a combination of input arguments is
/// not supported or in case of insufficient amount of shared memory to launch the kernel.
#[allow(clippy::too_many_arguments)]
pub fn multi_head_attn_backward_data<T, U, D1, D2>(
&self,
attn_desc: &AttentionDescriptor<T, U, D1, D2>,
Expand Down Expand Up @@ -382,11 +393,15 @@ impl CudnnContext {
///
/// * `reserve_space` - reserve space buffer in device memory.
///
/// cuDNN [docs](https://docs.nvidia.com/deeplearning/cudnn/api/index.html#cudnnMultiHeadAttnBackwardWeights)
/// may offer additional information about the APi behavior.
///
/// # Errors
///
/// Returns errors if an invalid or incompatible input argument was encountered, an inconsistent
/// internal state was encountered, a requested option or a combination of input arguments is
/// not supported or in case of insufficient amount of shared memory to launch the kernel.
#[allow(clippy::too_many_arguments)]
pub fn multi_head_attn_backward_weights<T, U, D1, D2>(
&self,
attn_desc: &AttentionDescriptor<T, U, D1, D2>,
Expand Down
3 changes: 3 additions & 0 deletions crates/cudnn/src/attention/seq_data_axis.rs
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,9 @@ use crate::sys;
/// Describes and indexes active dimensions in the `SeqDataDescriptor` `dim` field. This enum is
/// also used in the `axis` argument of the `SeqDataDescriptor` constructor to define the layout
/// of the sequence data buffer in memory.
///
/// cuDNN [docs](https://docs.nvidia.com/deeplearning/cudnn/api/index.html#cudnnSeqDataAxis_t)
/// may offer additional information about the APi behavior.
#[derive(Debug, Clone, Copy, PartialEq, Eq, Hash)]
pub enum SeqDataAxis {
/// Identifies the time (sequence length) dimension or specifies the time in the data layout.
Expand Down
5 changes: 4 additions & 1 deletion crates/cudnn/src/attention/seq_data_descriptor.rs
Original file line number Diff line number Diff line change
Expand Up @@ -71,6 +71,9 @@ where
///
/// * `seq_lengths` - array that defines all sequence lengths of the underlying container.
///
/// cuDNN [docs](https://docs.nvidia.com/deeplearning/cudnn/api/index.html#cudnnSetSeqDataDescriptor)
/// may offer additional information about the APi behavior.
///
/// # Errors
///
/// Returns errors if the innermost dimension as specified in the `axes` array is not
Expand Down Expand Up @@ -127,7 +130,7 @@ where
sys::cudnnSetSeqDataDescriptor(
raw,
T::into_raw(),
4 as i32,
4_i32,
dims.as_ptr(),
raw_axes.as_ptr(),
seq_lengths.len(),
Expand Down
12 changes: 12 additions & 0 deletions crates/cudnn/src/context.rs
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,9 @@ pub struct CudnnContext {
impl CudnnContext {
/// Creates a new cuDNN context, allocating the required memory on both host and device.
///
/// cuDNN [docs](https://docs.nvidia.com/deeplearning/cudnn/api/index.html#cudnnCreate)
/// may offer additional information about the APi behavior.
///
/// # Examples
///
/// ```
Expand All @@ -54,6 +57,9 @@ impl CudnnContext {
}

/// Returns the version number of the underlying cuDNN library.
///
/// cuDNN [docs](https://docs.nvidia.com/deeplearning/cudnn/api/index.html#cudnnGetVersion)
/// may offer additional information about the APi behavior.
pub fn version(&self) -> (u32, u32, u32) {
unsafe {
// cudnnGetVersion does not return a state as it never fails.
Expand All @@ -69,6 +75,9 @@ impl CudnnContext {
/// Since The same version of a given cuDNN library can be compiled against different CUDA
/// toolkit versions, this routine returns the CUDA toolkit version that the currently used
/// cuDNN library has been compiled against.
///
/// cuDNN [docs](https://docs.nvidia.com/deeplearning/cudnn/api/index.html#cudnnGetCudartVersion)
/// may offer additional information about the APi behavior.
pub fn cuda_version(&self) -> (u32, u32, u32) {
unsafe {
// cudnnGetCudartVersion does not return a state as it never fails.
Expand All @@ -94,6 +103,9 @@ impl CudnnContext {
///
/// `stream` - the CUDA stream to be written to the cuDNN handle.
///
/// cuDNN [docs](https://docs.nvidia.com/deeplearning/cudnn/api/index.html#cudnnSetStream)
/// may offer additional information about the APi behavior.
///
/// # Errors
///
/// Returns error if the supplied stream in invalid or a mismatch if found between the user
Expand Down
3 changes: 3 additions & 0 deletions crates/cudnn/src/convolution/convolution_config.rs
Original file line number Diff line number Diff line change
@@ -1,6 +1,9 @@
use crate::{private, DataType};

/// Supported data types configurations for convolution operations.
///
/// cuDNN [docs](https://docs.nvidia.com/deeplearning/cudnn/api/index.html#cudnnConvolutionForward)
/// may offer additional information about the APi behavior.
pub trait SupportedConv<X, W, Y>: private::Sealed + DataType
where
X: DataType,
Expand Down
9 changes: 9 additions & 0 deletions crates/cudnn/src/convolution/convolution_descriptor.rs
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,9 @@ impl<T: DataType> ConvDescriptor<T> {
/// * `math_type` - indicates whether or not the use of tensor op is permitted in the library
/// routines associated with a given convolution descriptor.
///
/// cuDNN [docs](https://docs.nvidia.com/deeplearning/cudnn/api/index.html#cudnnSetConvolutionNdDescriptor)
/// may offer additional information about the APi behavior.
///
/// # Errors
///
/// This function returns an error if any element of stride and dilation is negative or 0, if
Expand Down Expand Up @@ -123,6 +126,9 @@ impl<T: DataType> ConvDescriptor<T> {
///
/// **Do note** that tensor core operations may not be available on all device architectures.
///
/// cuDNN [docs](https://docs.nvidia.com/deeplearning/cudnn/api/index.html#cudnnSetConvolutionMathType)
/// may offer additional information about the APi behavior.
///
/// # Errors
///
/// Returns errors if the math type was not set successfully.
Expand Down Expand Up @@ -155,6 +161,9 @@ impl<T: DataType> ConvDescriptor<T> {
///
/// `groups` - group count.
///
/// cuDNN [docs](https://docs.nvidia.com/deeplearning/cudnn/api/index.html#cudnnSetConvolutionGroupCount)
/// may offer additional information about the APi behavior.
///
/// # Errors
///
/// Returns errors if the argument passed is invalid.
Expand Down
12 changes: 3 additions & 9 deletions crates/cudnn/src/convolution/convolution_mode.rs
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,9 @@ use crate::sys;
/// mathematically to a convolution or to a cross-correlation.
///
/// A cross-correlation is equivalent to a convolution with its filter rotated by 180 degrees.
///
/// cuDNN [docs](https://docs.nvidia.com/deeplearning/cudnn/api/index.html#cudnnConvolutionMode_t)
/// may offer additional information about the APi behavior.
#[derive(Debug, Clone, Copy, PartialEq, Eq, Hash)]
pub enum ConvMode {
/// Convolution operation.
Expand All @@ -14,15 +17,6 @@ pub enum ConvMode {
CrossCorrelation,
}

impl From<sys::cudnnConvolutionMode_t> for ConvMode {
fn from(raw: sys::cudnnConvolutionMode_t) -> Self {
match raw {
sys::cudnnConvolutionMode_t::CUDNN_CONVOLUTION => Self::Convolution,
sys::cudnnConvolutionMode_t::CUDNN_CROSS_CORRELATION => Self::CrossCorrelation,
}
}
}

impl From<ConvMode> for sys::cudnnConvolutionMode_t {
fn from(convolution_mode: ConvMode) -> sys::cudnnConvolutionMode_t {
match convolution_mode {
Expand Down
Loading