|
1 |
| -""" |
2 |
| -TimmUniversalEncoder provides a unified feature extraction interface built on the |
3 |
| -`timm` library, supporting both traditional-style (e.g., ResNet) and transformer-style |
4 |
| -models (e.g., Swin Transformer, ConvNeXt). |
5 |
| -
|
6 |
| -This encoder produces consistent multi-level feature maps for semantic segmentation tasks. |
7 |
| -It allows configuring the number of feature extraction stages (`depth`) and adjusting |
8 |
| -`output_stride` when supported. |
9 |
| -
|
10 |
| -Key Features: |
11 |
| -- Flexible model selection using `timm.create_model`. |
12 |
| -- Unified multi-level output across different model hierarchies. |
13 |
| -- Automatic alignment for inconsistent feature scales: |
14 |
| - - Transformer-style models (start at 1/4 scale): Insert dummy features for 1/2 scale. |
15 |
| - - VGG-style models (include scale-1 features): Align outputs for compatibility. |
16 |
| -- Easy access to feature scale information via the `reduction` property. |
17 |
| -
|
18 |
| -Feature Scale Differences: |
19 |
| -- Traditional-style models (e.g., ResNet): Scales at 1/2, 1/4, 1/8, 1/16, 1/32. |
20 |
| -- Transformer-style models (e.g., Swin Transformer): Start at 1/4 scale, skip 1/2 scale. |
21 |
| -- VGG-style models: Include scale-1 features (input resolution). |
22 |
| -
|
23 |
| -Notes: |
24 |
| -- `output_stride` is unsupported in some models, especially transformer-based architectures. |
25 |
| -- Special handling for models like TResNet and DLA to ensure correct feature indexing. |
26 |
| -- VGG-style models use `_is_vgg_style` to align scale-1 features with standard outputs. |
27 |
| -""" |
28 |
| - |
29 | 1 | from typing import Any, Optional
|
30 | 2 |
|
31 | 3 | import timm
|
|
0 commit comments