From cef15e8308415e064938a2091fa82f50901ff6be Mon Sep 17 00:00:00 2001 From: Pavel Iakubovskii Date: Sat, 25 Jan 2025 18:37:35 +0000 Subject: [PATCH] Update README.md --- README.md | 414 ++++++++---------------------------------------------- 1 file changed, 60 insertions(+), 354 deletions(-) diff --git a/README.md b/README.md index 0eaf4c23..91596cfb 100644 --- a/README.md +++ b/README.md @@ -1,7 +1,7 @@
![logo](https://i.ibb.co/dc1XdhT/Segmentation-Models-V2-Side-1-1.png) -**Python library with Neural Networks for Image +**Python library with Neural Networks for Image Semantic Segmentation based on [PyTorch](https://pytorch.org/).** @@ -18,13 +18,13 @@ Segmentation based on [PyTorch](https://pytorch.org/).**
-The main features of this library are: +The main features of the library are: - - High-level API (just two lines to create a neural network) - - 11 models architectures for binary and multi class segmentation (including legendary Unet) - - 124 available encoders (and 500+ encoders from [timm](https://github.com/rwightman/pytorch-image-models)) - - All encoders have pre-trained weights for faster and better convergence - - Popular metrics and losses for training routines + - Super simple high-level API (just two lines to create a neural network) + - 11 encoder-decoder model architectures (Unet, Unet++, Segformer, ...) + - 800+ **pretrained** convolution- and transform-based encoders, including [timm](https://github.com/huggingface/pytorch-image-models) support + - Popular metrics and losses for training routines (Dice, Jaccard, Tversky, ...) + - ONNX export and torch script/trace/compile friendly ### [📚 Project Documentation 📚](http://smp.readthedocs.io/) @@ -33,21 +33,18 @@ Visit [Read The Docs Project Page](https://smp.readthedocs.io/) or read the foll ### 📋 Table of content 1. [Quick start](#start) 2. [Examples](#examples) - 3. [Models](#models) - 1. [Architectures](#architectures) - 2. [Encoders](#encoders) - 3. [Timm Encoders](#timm) + 3. [Models and encoders](#models-and-encoders) 4. [Models API](#api) 1. [Input channels](#input-channels) 2. [Auxiliary classification output](#auxiliary-classification-output) 3. [Depth](#depth) 5. [Installation](#installation) - 6. [Competitions won with the library](#competitions-won-with-the-library) + 6. [Competitions won with the library](#competitions) 7. [Contributing](#contributing) 8. [Citing](#citing) 9. [License](#license) -### ⏳ Quick start +## ⏳ Quick start #### 1. Create your first Segmentation model with SMP @@ -78,16 +75,22 @@ preprocess_input = get_preprocessing_fn('resnet18', pretrained='imagenet') Congratulations! You are done! Now you can train your model with your favorite framework! -### 💡 Examples - - Training model for pets binary segmentation with Pytorch-Lightning [notebook](https://github.com/qubvel/segmentation_models.pytorch/blob/main/examples/binary_segmentation_intro.ipynb) and [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/qubvel/segmentation_models.pytorch/blob/main/examples/binary_segmentation_intro.ipynb) - - Training model for cars segmentation on CamVid dataset [here](https://github.com/qubvel/segmentation_models.pytorch/blob/main/examples/cars%20segmentation%20(camvid).ipynb). - - Training SMP model with [Catalyst](https://github.com/catalyst-team/catalyst) (high-level framework for PyTorch), [TTAch](https://github.com/qubvel/ttach) (TTA library for PyTorch) and [Albumentations](https://github.com/albu/albumentations) (fast image augmentation library) - [here](https://github.com/catalyst-team/catalyst/blob/v21.02rc0/examples/notebooks/segmentation-tutorial.ipynb) [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/catalyst-team/catalyst/blob/v21.02rc0/examples/notebooks/segmentation-tutorial.ipynb) - - Training SMP model with [Pytorch-Lightning](https://pytorch-lightning.readthedocs.io) framework - [here](https://github.com/ternaus/cloths_segmentation) (clothes binary segmentation by [@ternaus](https://github.com/ternaus)). - - Export trained model to ONNX - [notebook](https://github.com/qubvel/segmentation_models.pytorch/blob/main/examples/convert_to_onnx.ipynb) [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/qubvel/segmentation_models.pytorch/blob/main/examples/convert_to_onnx.ipynb) +## 💡 Examples -### 📦 Models +| Name | Link | Colab | +|-------------------------------------------|-----------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------| +| **Train** pets binary segmentation on OxfordPets | [Notebook](https://github.com/qubvel/segmentation_models.pytorch/blob/main/examples/binary_segmentation_intro.ipynb) | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/qubvel/segmentation_models.pytorch/blob/main/examples/binary_segmentation_intro.ipynb) | +| **Train** cars binary segmentation on CamVid | [Notebook](https://github.com/qubvel/segmentation_models.pytorch/blob/main/examples/cars%20segmentation%20(camvid).ipynb) | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/qubvel/segmentation_models.pytorch/blob/main/examples/cars%20segmentation%20(camvid).ipynb) | +| **Train** multiclass segmentation on CamVid | [Notebook](https://github.com/qubvel-org/segmentation_models.pytorch/blob/main/examples/camvid_segmentation_multiclass.ipynb) | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/qubvel-org/segmentation_models.pytorch/blob/main/examples/camvid_segmentation_multiclass.ipynb) | +| **Train** clothes binary segmentation by @ternaus | [Repo](https://github.com/ternaus/cloths_segmentation) | | +| **Load and inference** pretrained Segformer | [Notebook](https://github.com/qubvel-org/segmentation_models.pytorch/blob/main/examples/segformer_inference_pretrained.ipynb) | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/qubvel/segmentation_models.pytorch/blob/main/examples/segformer_inference_pretrained.ipynb) | +| **Save and load** models locally / to HuggingFace Hub |[Notebook](https://github.com/qubvel-org/segmentation_models.pytorch/blob/main/examples/save_load_model_and_share_with_hf_hub.ipynb) | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/qubvel/segmentation_models.pytorch/blob/main/examples/save_load_model_and_share_with_hf_hub.ipynb) +| **Export** trained model to ONNX | [Notebook](https://github.com/qubvel/segmentation_models.pytorch/blob/main/examples/convert_to_onnx.ipynb) | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/qubvel/segmentation_models.pytorch/blob/main/examples/convert_to_onnx.ipynb) | -#### Architectures + +## 📦 Models and encoders + +### Architectures - Unet [[paper](https://arxiv.org/abs/1505.04597)] [[docs](https://smp.readthedocs.io/en/latest/models.html#unet)] - Unet++ [[paper](https://arxiv.org/pdf/1807.10165.pdf)] [[docs](https://smp.readthedocs.io/en/latest/models.html#id2)] - MAnet [[paper](https://ieeexplore.ieee.org/abstract/document/9201310)] [[docs](https://smp.readthedocs.io/en/latest/models.html#manet)] @@ -100,339 +103,38 @@ Congratulations! You are done! Now you can train your model with your favorite f - UPerNet [[paper](https://arxiv.org/abs/1807.10221)] [[docs](https://smp.readthedocs.io/en/latest/models.html#upernet)] - Segformer [[paper](https://arxiv.org/abs/2105.15203)] [[docs](https://smp.readthedocs.io/en/latest/models.html#segformer)] -#### Encoders - -The following is a list of supported encoders in the SMP. Select the appropriate family of encoders and click to expand the table and select a specific encoder and its pre-trained weights (`encoder_name` and `encoder_weights` parameters). - -
-ResNet -
- -|Encoder |Weights |Params, M | -|--------------------------------|:------------------------------:|:------------------------------:| -|resnet18 |imagenet / ssl / swsl |11M | -|resnet34 |imagenet |21M | -|resnet50 |imagenet / ssl / swsl |23M | -|resnet101 |imagenet |42M | -|resnet152 |imagenet |58M | - -
-
- -
-ResNeXt -
- -|Encoder |Weights |Params, M | -|--------------------------------|:------------------------------:|:------------------------------:| -|resnext50_32x4d |imagenet / ssl / swsl |22M | -|resnext101_32x4d |ssl / swsl |42M | -|resnext101_32x8d |imagenet / instagram / ssl / swsl|86M | -|resnext101_32x16d |instagram / ssl / swsl |191M | -|resnext101_32x32d |instagram |466M | -|resnext101_32x48d |instagram |826M | - -
-
- -
-ResNeSt -
- -|Encoder |Weights |Params, M | -|--------------------------------|:------------------------------:|:------------------------------:| -|timm-resnest14d |imagenet |8M | -|timm-resnest26d |imagenet |15M | -|timm-resnest50d |imagenet |25M | -|timm-resnest101e |imagenet |46M | -|timm-resnest200e |imagenet |68M | -|timm-resnest269e |imagenet |108M | -|timm-resnest50d_4s2x40d |imagenet |28M | -|timm-resnest50d_1s4x24d |imagenet |23M | - -
-
- -
-Res2Ne(X)t -
- -|Encoder |Weights |Params, M | -|--------------------------------|:------------------------------:|:------------------------------:| -|timm-res2net50_26w_4s |imagenet |23M | -|timm-res2net101_26w_4s |imagenet |43M | -|timm-res2net50_26w_6s |imagenet |35M | -|timm-res2net50_26w_8s |imagenet |46M | -|timm-res2net50_48w_2s |imagenet |23M | -|timm-res2net50_14w_8s |imagenet |23M | -|timm-res2next50 |imagenet |22M | - -
-
- -
-RegNet(x/y) -
- -|Encoder |Weights |Params, M | -|--------------------------------|:------------------------------:|:------------------------------:| -|timm-regnetx_002 |imagenet |2M | -|timm-regnetx_004 |imagenet |4M | -|timm-regnetx_006 |imagenet |5M | -|timm-regnetx_008 |imagenet |6M | -|timm-regnetx_016 |imagenet |8M | -|timm-regnetx_032 |imagenet |14M | -|timm-regnetx_040 |imagenet |20M | -|timm-regnetx_064 |imagenet |24M | -|timm-regnetx_080 |imagenet |37M | -|timm-regnetx_120 |imagenet |43M | -|timm-regnetx_160 |imagenet |52M | -|timm-regnetx_320 |imagenet |105M | -|timm-regnety_002 |imagenet |2M | -|timm-regnety_004 |imagenet |3M | -|timm-regnety_006 |imagenet |5M | -|timm-regnety_008 |imagenet |5M | -|timm-regnety_016 |imagenet |10M | -|timm-regnety_032 |imagenet |17M | -|timm-regnety_040 |imagenet |19M | -|timm-regnety_064 |imagenet |29M | -|timm-regnety_080 |imagenet |37M | -|timm-regnety_120 |imagenet |49M | -|timm-regnety_160 |imagenet |80M | -|timm-regnety_320 |imagenet |141M | - -
-
- -
-GERNet -
+### Encoders -|Encoder |Weights |Params, M | -|--------------------------------|:------------------------------:|:------------------------------:| -|timm-gernet_s |imagenet |6M | -|timm-gernet_m |imagenet |18M | -|timm-gernet_l |imagenet |28M | +The library provides a wide range of **pretrained** encoders (also known as backbones) for segmentation models. Instead of using features from the final layer of a classification model, we extract **intermediate features** and feed them into the decoder for segmentation tasks. -
-
+All encoders come with **pretrained weights**, which help achieve **faster and more stable convergence** when training segmentation models. -
-SE-Net -
+Given the extensive selection of supported encoders, you can choose the best one for your specific use case, for example: +- **Lightweight encoders** for low-latency applications or real-time inference on edge devices (mobilenet/mobileone). +- **High-capacity architectures** for complex tasks involving a large number of segmented classes, providing superior accuracy (convnext/swin/mit). -|Encoder |Weights |Params, M | -|--------------------------------|:------------------------------:|:------------------------------:| -|senet154 |imagenet |113M | -|se_resnet50 |imagenet |26M | -|se_resnet101 |imagenet |47M | -|se_resnet152 |imagenet |64M | -|se_resnext50_32x4d |imagenet |25M | -|se_resnext101_32x4d |imagenet |46M | +By selecting the right encoder, you can balance **efficiency, performance, and model complexity** to suit your project needs. -
-
+All encoders and corresponding pretrained weight are listed in the documentation: + - [table](https://smp.readthedocs.io/en/latest/encoders.html) with natively ported encoders + - [table](https://smp.readthedocs.io/en/latest/encoders_timm.html) with [timm](https://github.com/huggingface/pytorch-image-models) encoders supported -
-SK-ResNe(X)t -
+## 🔁 Models API -|Encoder |Weights |Params, M | -|--------------------------------|:------------------------------:|:------------------------------:| -|timm-skresnet18 |imagenet |11M | -|timm-skresnet34 |imagenet |21M | -|timm-skresnext50_32x4d |imagenet |25M | - -
-
- -
-DenseNet -
- -|Encoder |Weights |Params, M | -|--------------------------------|:------------------------------:|:------------------------------:| -|densenet121 |imagenet |6M | -|densenet169 |imagenet |12M | -|densenet201 |imagenet |18M | -|densenet161 |imagenet |26M | - -
-
- -
-Inception -
- -|Encoder |Weights |Params, M | -|--------------------------------|:------------------------------:|:------------------------------:| -|inceptionresnetv2 |imagenet / imagenet+background |54M | -|inceptionv4 |imagenet / imagenet+background |41M | -|xception |imagenet |22M | - -
-
- -
-EfficientNet -
- -|Encoder |Weights |Params, M | -|--------------------------------|:------------------------------:|:------------------------------:| -|efficientnet-b0 |imagenet |4M | -|efficientnet-b1 |imagenet |6M | -|efficientnet-b2 |imagenet |7M | -|efficientnet-b3 |imagenet |10M | -|efficientnet-b4 |imagenet |17M | -|efficientnet-b5 |imagenet |28M | -|efficientnet-b6 |imagenet |40M | -|efficientnet-b7 |imagenet |63M | -|timm-efficientnet-b0 |imagenet / advprop / noisy-student|4M | -|timm-efficientnet-b1 |imagenet / advprop / noisy-student|6M | -|timm-efficientnet-b2 |imagenet / advprop / noisy-student|7M | -|timm-efficientnet-b3 |imagenet / advprop / noisy-student|10M | -|timm-efficientnet-b4 |imagenet / advprop / noisy-student|17M | -|timm-efficientnet-b5 |imagenet / advprop / noisy-student|28M | -|timm-efficientnet-b6 |imagenet / advprop / noisy-student|40M | -|timm-efficientnet-b7 |imagenet / advprop / noisy-student|63M | -|timm-efficientnet-b8 |imagenet / advprop |84M | -|timm-efficientnet-l2 |noisy-student |474M | -|timm-efficientnet-lite0 |imagenet |4M | -|timm-efficientnet-lite1 |imagenet |5M | -|timm-efficientnet-lite2 |imagenet |6M | -|timm-efficientnet-lite3 |imagenet |8M | -|timm-efficientnet-lite4 |imagenet |13M | - -
-
- -
-MobileNet -
- -|Encoder |Weights |Params, M | -|--------------------------------|:------------------------------:|:------------------------------:| -|mobilenet_v2 |imagenet |2M | -|timm-mobilenetv3_large_075 |imagenet |1.78M | -|timm-mobilenetv3_large_100 |imagenet |2.97M | -|timm-mobilenetv3_large_minimal_100|imagenet |1.41M | -|timm-mobilenetv3_small_075 |imagenet |0.57M | -|timm-mobilenetv3_small_100 |imagenet |0.93M | -|timm-mobilenetv3_small_minimal_100|imagenet |0.43M | - -
-
+### Input channels -
-DPN -
+The input channels parameter allows you to create a model that can process a tensor with an arbitrary number of channels. +If you use pretrained weights from ImageNet, the weights of the first convolution will be reused: + - For the 1-channel case, it would be a sum of the weights of the first convolution layer. + - Otherwise, channels would be populated with weights like `new_weight[:, i] = pretrained_weight[:, i % 3]`, and then scaled with `new_weight * 3 / new_in_channels`. -|Encoder |Weights |Params, M | -|--------------------------------|:------------------------------:|:------------------------------:| -|dpn68 |imagenet |11M | -|dpn68b |imagenet+5k |11M | -|dpn92 |imagenet+5k |34M | -|dpn98 |imagenet |58M | -|dpn107 |imagenet+5k |84M | -|dpn131 |imagenet |76M | - -
-
- -
-VGG -
- -|Encoder |Weights |Params, M | -|--------------------------------|:------------------------------:|:------------------------------:| -|vgg11 |imagenet |9M | -|vgg11_bn |imagenet |9M | -|vgg13 |imagenet |9M | -|vgg13_bn |imagenet |9M | -|vgg16 |imagenet |14M | -|vgg16_bn |imagenet |14M | -|vgg19 |imagenet |20M | -|vgg19_bn |imagenet |20M | - -
-
- -
-Mix Vision Transformer -
- -Backbone from SegFormer pretrained on Imagenet! Can be used with other decoders from package, you can combine Mix Vision Transformer with Unet, FPN and others! - -Limitations: - - - encoder is **not** supported by Linknet, Unet++ - - encoder is supported by FPN only for encoder **depth = 5** - -|Encoder |Weights |Params, M | -|--------------------------------|:------------------------------:|:------------------------------:| -|mit_b0 |imagenet |3M | -|mit_b1 |imagenet |13M | -|mit_b2 |imagenet |24M | -|mit_b3 |imagenet |44M | -|mit_b4 |imagenet |60M | -|mit_b5 |imagenet |81M | - -
-
- -
-MobileOne -
- -Apple's "sub-one-ms" Backbone pretrained on Imagenet! Can be used with all decoders. - -Note: In the official github repo the s0 variant has additional num_conv_branches, leading to more params than s1. - -|Encoder |Weights |Params, M | -|--------------------------------|:------------------------------:|:------------------------------:| -|mobileone_s0 |imagenet |4.6M | -|mobileone_s1 |imagenet |4.0M | -|mobileone_s2 |imagenet |6.5M | -|mobileone_s3 |imagenet |8.8M | -|mobileone_s4 |imagenet |13.6M | - -
-
- - -\* `ssl`, `swsl` - semi-supervised and weakly-supervised learning on ImageNet ([repo](https://github.com/facebookresearch/semi-supervised-ImageNet1K-models)). - -#### Timm Encoders - -[docs](https://smp.readthedocs.io/en/latest/encoders_timm.html) - -Pytorch Image Models (a.k.a. timm) has a lot of pretrained models and interface which allows using these models as encoders in smp, however, not all models are supported - - - not all transformer models have ``features_only`` functionality implemented that is required for encoder - - some models have inappropriate strides - -Total number of supported encoders: 549 - - [table with available encoders](https://smp.readthedocs.io/en/latest/encoders_timm.html) - -### 🔁 Models API - - - `model.encoder` - pretrained backbone to extract features of different spatial resolution - - `model.decoder` - depends on models architecture (`Unet`/`Linknet`/`PSPNet`/`FPN`) - - `model.segmentation_head` - last block to produce required number of mask channels (include also optional upsampling and activation) - - `model.classification_head` - optional block which create classification head on top of encoder - - `model.forward(x)` - sequentially pass `x` through model\`s encoder, decoder and segmentation head (and classification head if specified) - -##### Input channels -Input channels parameter allows you to create models, which process tensors with arbitrary number of channels. -If you use pretrained weights from imagenet - weights of first convolution will be reused. For -1-channel case it would be a sum of weights of first convolution layer, otherwise channels would be -populated with weights like `new_weight[:, i] = pretrained_weight[:, i % 3]` and than scaled with `new_weight * 3 / new_in_channels`. ```python model = smp.FPN('resnet34', in_channels=1) mask = model(torch.ones([1, 1, 64, 64])) ``` -##### Auxiliary classification output +### Auxiliary classification output + All models support `aux_params` parameters, which is default set to `None`. If `aux_params = None` then classification auxiliary output is not created, else model produce not only `mask`, but also `label` output with shape `NC`. @@ -449,50 +151,54 @@ model = smp.Unet('resnet34', classes=4, aux_params=aux_params) mask, label = model(x) ``` -##### Depth +### Depth + Depth parameter specify a number of downsampling operations in encoder, so you can make your model lighter if specify smaller `depth`. ```python model = smp.Unet('resnet34', encoder_depth=4) ``` - -### 🛠 Installation +## 🛠 Installation PyPI version: + ```bash $ pip install segmentation-models-pytorch ```` -Latest version from source: + +The latest version from GitHub: + ```bash $ pip install git+https://github.com/qubvel/segmentation_models.pytorch ```` -### 🏆 Competitions won with the library +## 🏆 Competitions won with the library -`Segmentation Models` package is widely used in the image segmentation competitions. +`Segmentation Models` package is widely used in image segmentation competitions. [Here](https://github.com/qubvel/segmentation_models.pytorch/blob/main/HALLOFFAME.md) you can find competitions, names of the winners and links to their solutions. -### 🤝 Contributing +## 🤝 Contributing -#### Install SMP +1. Install SMP in dev mode ```bash -make install_dev # create .venv, install SMP in dev mode +make install_dev # Create .venv, install SMP in dev mode ``` -#### Run tests and code checks +2. Run tests and code checks ```bash +make test # Run tests suite with pytest make fixup # Ruff for formatting and lint checks ``` -#### Update table with encoders +3. Update a table (in case you added an encoder) ```bash -make table # generate a table with encoders and print to stdout +make table # Generates a table with encoders and print to stdout ``` -### 📝 Citing +## 📝 Citing ``` @misc{Iakubovskii:2019, Author = {Pavel Iakubovskii}, @@ -504,5 +210,5 @@ make table # generate a table with encoders and print to stdout } ``` -### 🛡️ License +## 🛡️ License The project is primarily distributed under [MIT License](https://github.com/qubvel/segmentation_models.pytorch/blob/main/LICENSE), while some files are subject to other licenses. Please refer to [LICENSES](licenses/LICENSES.md) and license statements in each file for careful check, especially for commercial use.