You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When using segmentation_models_pytorch.DPT with the encoder tu-vit_base_patch16_224.augreg_in21k and default parameters, I get a runtime error related to tensor shape mismatch.
Code to reproduce
import segmentation_models_pytorch as smp
import torch
model = smp.DPT(
encoder_name='tu-vit_base_patch16_224.augreg_in21k',
encoder_depth=4,
encoder_weights='imagenet',
encoder_output_indices=None,
decoder_readout='cat',
decoder_intermediate_channels=(256, 512, 1024, 1024),
decoder_fusion_channels=256,
in_channels=3,
classes=1,
activation=None,
aux_params=None
)
x = torch.rand(8, 3, 224, 224)
y = model(x) # RuntimeError occurs here
Error traceback RuntimeError: The expanded size of the tensor (196) must match the existing size (8) at non-singleton dimension 1. Target sizes: [8, 196, 768]. Tensor sizes: [8, 768]
Environment
segmentation-models-pytorch: latest version (0.4.1.dev0)
timm: 1.0.15
pytorch: 2.4.0
python: 3.10.14
OS: Windows 10
I also tried setting encoder_weights=None and explicitly specifying encoder_output_indices=(3, 6, 9, 11), but the same error occurs. It seems the encoder is returning [B, C] (e.g., [8, 768]) instead of the expected [B, N, C] shape, causing reshape operations in DPT to fail. Please let me know if I'm missing something in the usage of ViT encoders with DPT.
The text was updated successfully, but these errors were encountered:
When using segmentation_models_pytorch.DPT with the encoder tu-vit_base_patch16_224.augreg_in21k and default parameters, I get a runtime error related to tensor shape mismatch.
Code to reproduce
Error traceback
RuntimeError: The expanded size of the tensor (196) must match the existing size (8) at non-singleton dimension 1. Target sizes: [8, 196, 768]. Tensor sizes: [8, 768]
Environment
segmentation-models-pytorch: latest version (0.4.1.dev0)
timm: 1.0.15
pytorch: 2.4.0
python: 3.10.14
OS: Windows 10
I also tried setting encoder_weights=None and explicitly specifying encoder_output_indices=(3, 6, 9, 11), but the same error occurs. It seems the encoder is returning [B, C] (e.g., [8, 768]) instead of the expected [B, N, C] shape, causing reshape operations in DPT to fail. Please let me know if I'm missing something in the usage of ViT encoders with DPT.
The text was updated successfully, but these errors were encountered: