Florence-CoSwin-H
|
Florence: A New Foundation Model for Computer Vis…
|
99.02
|
2021-11-22
|
|
Meta Pseudo Labels (EfficientNet-L2)
|
Meta Pseudo Labels
|
98.80
|
2020-03-23
|
|
BiT-L (ResNet)
|
Big Transfer (BiT): General Visual Representation…
|
98.46
|
2019-12-24
|
|
PNASNet-5
|
Progressive Neural Architecture Search
|
96.20
|
2017-12-02
|
|
GhostNetV3 1.6x
|
GhostNetV3: Exploring the Training Strategies for…
|
95.20
|
2024-04-17
|
|
ResNeXt-101 64x4
|
Aggregated Residual Transformations for Deep Neur…
|
94.70
|
2016-11-16
|
|
GhostNetV3 1.3x
|
GhostNetV3: Exploring the Training Strategies for…
|
94.50
|
2024-04-17
|
|
GhostNetV3 1.0x
|
GhostNetV3: Exploring the Training Strategies for…
|
93.30
|
2024-04-17
|
|
GhostNetV3 0.5x
|
GhostNetV3: Exploring the Training Strategies for…
|
88.50
|
2024-04-17
|
|
Unicom (ViT-L/14@336px) (Finetuned)
|
Unicom: Universal and Compact Representation Lear…
|
88.30
|
2023-04-12
|
|
Bamboo (Bamboo-H)
|
A Study on Transformer Configuration and Training…
|
87.10
|
2022-05-21
|
|
Bamboo (Bamboo-L)
|
A Study on Transformer Configuration and Training…
|
86.30
|
2022-05-21
|
|
TinySaver(ConvNeXtV2_h, 0.01 Acc drop)
|
Tiny Models are the Computational Saver for Large…
|
86.24
|
2024-03-26
|
|
Refiner-ViT-L
|
Refiner: Refining Self-attention for Vision Trans…
|
86.03
|
2021-06-07
|
|
TinySaver(ConvNeXtV2_h, 0.5 Acc drop)
|
Tiny Models are the Computational Saver for Large…
|
85.75
|
2024-03-26
|
|
TinySaver(Swin_large, 0.5 Acc drop)
|
Tiny Models are the Computational Saver for Large…
|
85.74
|
2024-03-26
|
|
TinySaver(Swin_large, 1.0 Acc drop)
|
Tiny Models are the Computational Saver for Large…
|
85.24
|
2024-03-26
|
|
Bamboo (Bamboo-B)
|
A Study on Transformer Configuration and Training…
|
84.20
|
2022-05-21
|
|
AIM-7B
|
Scalable Pre-training of Large Autoregressive Ima…
|
84.00
|
2024-01-16
|
|
DynamicViT-LV-M/0.8
|
DynamicViT: Efficient Vision Transformers with Dy…
|
83.90
|
2021-06-03
|
|
TinySaver(EfficientFormerV2_l, 0.01 Acc drop)
|
Tiny Models are the Computational Saver for Large…
|
83.52
|
2024-03-26
|
|
KAT-B*
|
Kolmogorov-Arnold Transformer
|
82.80
|
2024-09-16
|
|
ReViT-B
|
ReViT: Enhancing Vision Transformers Feature Dive…
|
82.40
|
2024-02-17
|
|
ConvNeXt-T-Hermite
|
Polynomial, trigonometric, and tropical activatio…
|
82.34
|
2025-02-03
|
|
ConvMixer-1536/20
|
Patches Are All You Need?
|
82.20
|
2022-01-24
|
|
DIFFQ (λ=1e−2)
|
Differentiable Model Compression via Pseudo Quant…
|
82.00
|
2021-04-20
|
|
DeiT-B
|
Kolmogorov-Arnold Transformer
|
81.80
|
2024-09-16
|
|
SimpleNetV1-9m-correct-labels
|
Lets keep it simple, Using simple architectures t…
|
81.24
|
2016-08-22
|
|
ResNeXt-101 (Debiased+CutMix)
|
Shape-Texture Debiased Neural Network Training
|
81.20
|
2020-10-12
|
|
SimpleNetV1-5m-correct-labels
|
Lets keep it simple, Using simple architectures t…
|
79.12
|
2016-08-22
|
|
ViT-B/16
|
Kolmogorov-Arnold Transformer
|
79.10
|
2024-09-16
|
|
ConvMLP-S
|
ConvMLP: Hierarchical Convolutional MLPs for Visi…
|
76.80
|
2021-09-09
|
|
SimpleNetV1-small-075-correct-labels
|
Lets keep it simple, Using simple architectures t…
|
75.66
|
2016-08-22
|
|
FF
|
Do You Even Need Attention? A Stack of Feed-Forwa…
|
74.90
|
2021-05-06
|
|
SimpleNetV1-9m
|
Lets keep it simple, Using simple architectures t…
|
74.17
|
2016-08-22
|
|
SimpleNetV1-5m
|
Lets keep it simple, Using simple architectures t…
|
71.94
|
2016-08-22
|
|
GAC-SNN MS-ResNet-34
|
Gated Attention Coding for Training High-performa…
|
70.42
|
2023-08-12
|
|
SimpleNetV1-small-05-correct-labels
|
Lets keep it simple, Using simple architectures t…
|
69.11
|
2016-08-22
|
|
SimpleNetV1-small-075
|
Lets keep it simple, Using simple architectures t…
|
68.15
|
2016-08-22
|
|
SimpleNetV1-small-05
|
Lets keep it simple, Using simple architectures t…
|
61.52
|
2016-08-22
|
|
EfficientNet-B2
|
EfficientNet: Rethinking Model Scaling for Convol…
|
|
2019-05-28
|
|
RDNet-L (384 res)
|
DenseNets Reloaded: Paradigm Shift Beyond ResNets…
|
|
2024-03-28
|
|
DAT-T++
|
DAT++: Spatially Dynamic Vision Transformer with …
|
|
2023-09-04
|
|
MAWS (ViT-2B)
|
The effectiveness of MAE pre-pretraining for bill…
|
|
2023-03-23
|
|
FasterViT-5
|
FasterViT: Fast Vision Transformers with Hierarch…
|
|
2023-06-09
|
|
CoCa (finetuned)
|
CoCa: Contrastive Captioners are Image-Text Found…
|
|
2022-05-04
|
|
Model soups (BASIC-L)
|
Model soups: averaging weights of multiple fine-t…
|
|
2022-03-10
|
|
Model soups (ViT-G/14)
|
Model soups: averaging weights of multiple fine-t…
|
|
2022-03-10
|
|
DaViT-G
|
DaViT: Dual Attention Vision Transformers
|
|
2022-04-07
|
|
DaViT-H
|
DaViT: Dual Attention Vision Transformers
|
|
2022-04-07
|
|
SwinV2-G
|
Swin Transformer V2: Scaling Up Capacity and Reso…
|
|
2021-11-18
|
|
MAWS (ViT-6.5B)
|
The effectiveness of MAE pre-pretraining for bill…
|
|
2023-03-23
|
|
Meta Pseudo Labels (EfficientNet-B6-Wide)
|
Meta Pseudo Labels
|
|
2020-03-23
|
|
RevCol-H
|
Reversible Column Networks
|
|
2022-12-22
|
|
EVA
|
EVA: Exploring the Limits of Masked Visual Repres…
|
|
2022-11-14
|
|
M3I Pre-training (InternImage-H)
|
Towards All-in-one Pre-training via Maximizing Mu…
|
|
2022-11-17
|
|
ViT-L/16 (384res, distilled from ViT-22B)
|
Scaling Vision Transformers to 22 Billion Paramet…
|
|
2023-02-10
|
|
InternImage-H
|
InternImage: Exploring Large-Scale Vision Foundat…
|
|
2022-11-10
|
|
MaxViT-XL (512res, JFT)
|
MaxViT: Multi-Axis Vision Transformer
|
|
2022-04-04
|
|
AIMv2-3B (448 res)
|
Multimodal Autoregressive Pre-training of Large V…
|
|
2024-11-21
|
|
MAWS (ViT-H)
|
The effectiveness of MAE pre-pretraining for bill…
|
|
2023-03-23
|
|
MaxViT-L (512res, JFT)
|
MaxViT: Multi-Axis Vision Transformer
|
|
2022-04-04
|
|
MaxViT-XL (384res, JFT)
|
MaxViT: Multi-Axis Vision Transformer
|
|
2022-04-04
|
|
DiscreteViT
|
Discrete Representations Strengthen Vision Transf…
|
|
2021-11-20
|
|
ViT-M@224 (cosub)
|
Co-training $2^L$ Submodels for Visual Recognition
|
|
2022-12-09
|
|
NFNet-F4+
|
High-Performance Large-Scale Image Recognition Wi…
|
|
2021-02-11
|
|
MaxViT-L (384res, JFT)
|
MaxViT: Multi-Axis Vision Transformer
|
|
2022-04-04
|
|
MOAT-4 22K+1K
|
MOAT: Alternating Mobile Convolution and Attentio…
|
|
2022-10-04
|
|
FD (CLIP ViT-L-336)
|
Contrastive Learning Rivals Masked Image Modeling…
|
|
2022-05-27
|
|
Last Layer Tuning with Newton Step (ViT-G/14))
|
Differentially Private Image Classification from …
|
|
2022-11-24
|
|
TokenLearner L/8 (24+11)
|
TokenLearner: What Can 8 Learned Tokens Do for Im…
|
|
2021-06-21
|
|
MaxViT-B (512res, JFT)
|
MaxViT: Multi-Axis Vision Transformer
|
|
2022-04-04
|
|
MAWS (ViT-L)
|
The effectiveness of MAE pre-pretraining for bill…
|
|
2023-03-23
|
|
MogaNet-XL (384res)
|
MogaNet: Multi-order Gated Aggregation Network
|
|
2022-11-07
|
|
MViTv2-H (512 res, ImageNet-21k pretrain)
|
MViTv2: Improved Multiscale Vision Transformers f…
|
|
2021-12-02
|
|
MaxViT-XL (512res, 21K)
|
MaxViT: Multi-Axis Vision Transformer
|
|
2022-04-04
|
|
MaxViT-B (384res, JFT)
|
MaxViT: Multi-Axis Vision Transformer
|
|
2022-04-04
|
|
ALIGN (EfficientNet-L2)
|
Scaling Up Visual and Vision-Language Representat…
|
|
2021-02-11
|
|
EfficientNet-L2-475 (SAM)
|
Sharpness-Aware Minimization for Efficiently Impr…
|
|
2020-10-03
|
|
ViT-B/16
|
Scaling Vision Transformers to 22 Billion Paramet…
|
|
2023-02-10
|
|
VAN-B6 (22K, 384res)
|
Visual Attention Network
|
|
2022-02-20
|
|
ViC-MAE (ViT-L)
|
ViC-MAE: Self-Supervised Representation Learning …
|
|
2023-03-21
|
|
BEiT-L (ViT; ImageNet-22K pretrain)
|
BEiT: BERT Pre-Training of Image Transformers
|
|
2021-06-15
|
|
SWAG (ViT H/14)
|
Revisiting Weakly Supervised Pre-Training of Visu…
|
|
2022-01-20
|
|
ViT-H/14
|
An Image is Worth 16x16 Words: Transformers for I…
|
|
2020-10-22
|
|
CoAtNet-3 @384
|
CoAtNet: Marrying Convolution and Attention for A…
|
|
2021-06-09
|
|
MaxViT-XL (384res, 21K)
|
MaxViT: Multi-Axis Vision Transformer
|
|
2022-04-04
|
|
OpenCLIP ViT-H/14
|
Reproducible scaling laws for contrastive languag…
|
|
2022-12-14
|
|
AIMv2-3B
|
Multimodal Autoregressive Pre-training of Large V…
|
|
2024-11-21
|
|
FixEfficientNet-L2
|
Fixing the train-test resolution discrepancy: Fix…
|
|
2020-03-18
|
|
ViTAE-H + MAE (448)
|
ViTAEv2: Vision Transformer Advanced by Exploring…
|
|
2022-02-21
|
|
MaxViT-L (512res, 21K)
|
MaxViT: Multi-Axis Vision Transformer
|
|
2022-04-04
|
|
MViTv2-L (384 res, ImageNet-21k pretrain)
|
MViTv2: Improved Multiscale Vision Transformers f…
|
|
2021-12-02
|
|
NoisyStudent (EfficientNet-L2)
|
Self-training with Noisy Student improves ImageNe…
|
|
2019-11-11
|
|
MaxViT-B (512res, 21K)
|
MaxViT: Multi-Axis Vision Transformer
|
|
2022-04-04
|
|
Top-k DiffSortNets (EfficientNet-L2)
|
Differentiable Top-k Classification Learning
|
|
2022-06-15
|
|
Adlik-ViT-SG+Swin_large+Convnext_xlarge(384)
|
A ConvNet for the 2020s
|
|
2022-01-10
|
|
V-MoE-H/14 (Every-2)
|
Scaling Vision with Sparse Mixture of Experts
|
|
2021-06-10
|
|
AIMv2-L
|
Multimodal Autoregressive Pre-training of Large V…
|
|
2024-11-21
|
|
HVT Large
|
HVT: A Comprehensive Vision Framework for Learnin…
|
|
2024-09-25
|
|
FixEfficientNet-B3
|
Fixing the train-test resolution discrepancy: Fix…
|
|
2020-03-18
|
|
MaxViT-L (384res, 21K)
|
MaxViT: Multi-Axis Vision Transformer
|
|
2022-04-04
|
|
PeCo (ViT-H, 448)
|
PeCo: Perceptual Codebook for BERT Pre-training o…
|
|
2021-11-24
|
|
DFN-5B H/14-378 + PrefixedIter Decoder
|
Unconstrained Open Vocabulary Image Classificatio…
|
|
2024-07-15
|
|
dBOT ViT-H (CLIP as Teacher)
|
Exploring Target Representations for Masked Autoe…
|
|
2022-09-08
|
|
MambaVision-L3
|
MambaVision: A Hybrid Mamba-Transformer Vision Ba…
|
|
2024-07-10
|
|
CAFormer-B36 (384 res, 21K)
|
MetaFormer Baselines for Vision
|
|
2022-10-24
|
|
AIMv2-1B
|
Multimodal Autoregressive Pre-training of Large V…
|
|
2024-11-21
|
|
VIT-H/14
|
Scaling Vision with Sparse Mixture of Experts
|
|
2021-06-10
|
|
ViT-H@224 (cosub)
|
Co-training $2^L$ Submodels for Visual Recognition
|
|
2022-12-09
|
|
UniRepLKNet-XL++
|
UniRepLKNet: A Universal Perception Large-Kernel …
|
|
2023-11-27
|
|
InternImage-XL
|
InternImage: Exploring Large-Scale Vision Foundat…
|
|
2022-11-10
|
|
MViTv2-H (mageNet-21k pretrain)
|
MViTv2: Improved Multiscale Vision Transformers f…
|
|
2021-12-02
|
|
Mixer-H/14 (JFT-300M pre-train)
|
MLP-Mixer: An all-MLP Architecture for Vision
|
|
2021-05-04
|
|
UniRepLKNet-L++
|
UniRepLKNet: A Universal Perception Large-Kernel …
|
|
2023-11-27
|
|
dBOT ViT-L (CLIP as Teacher)
|
Exploring Target Representations for Masked Autoe…
|
|
2022-09-08
|
|
RepLKNet-XL
|
Scaling Up Your Kernels to 31x31: Revisiting Larg…
|
|
2022-03-13
|
|
ConvNeXt-XL (ImageNet-22k)
|
A ConvNet for the 2020s
|
|
2022-01-10
|
|
MAE (ViT-H, 448)
|
Masked Autoencoders Are Scalable Vision Learners
|
|
2021-11-11
|
|
ViT-L/16
|
An Image is Worth 16x16 Words: Transformers for I…
|
|
2020-10-22
|
|
HorNet-L (GF)
|
HorNet: Efficient High-Order Spatial Interactions…
|
|
2022-07-28
|
|
CvT-W24 (384 res, ImageNet-22k pretrain)
|
CvT: Introducing Convolutions to Vision Transform…
|
|
2021-03-29
|
|
InternImage-L
|
InternImage: Exploring Large-Scale Vision Foundat…
|
|
2022-11-10
|
|
CoAtNet-3 (21k)
|
CoAtNet: Marrying Convolution and Attention for A…
|
|
2021-06-09
|
|
ConvFormer-B36 (384 res, 21K)
|
MetaFormer Baselines for Vision
|
|
2022-10-24
|
|
ASF-former-B
|
Adaptive Split-Fusion Transformer
|
|
2022-04-26
|
|
ASF-former-S
|
Adaptive Split-Fusion Transformer
|
|
2022-04-26
|
|
PeCo (ViT-H, 224)
|
PeCo: Perceptual Codebook for BERT Pre-training o…
|
|
2021-11-24
|
|
ViT-L@224 (cosub)
|
Co-training $2^L$ Submodels for Visual Recognition
|
|
2022-12-09
|
|
CAFormer-M36 (384 res, 21K)
|
MetaFormer Baselines for Vision
|
|
2022-10-24
|
|
CSWin-L (384 res,ImageNet-22k pretrain)
|
CSWin Transformer: A General Vision Transformer B…
|
|
2021-07-01
|
|
DaViT-L (ImageNet-22k)
|
DaViT: Dual Attention Vision Transformers
|
|
2022-04-07
|
|
DiNAT-Large (11x11ks; 384res; Pretrained on IN22K@224)
|
Dilated Neighborhood Attention Transformer
|
|
2022-09-29
|
|
AIMv2-H
|
Multimodal Autoregressive Pre-training of Large V…
|
|
2024-11-21
|
|
V-MoE-L/16 (Every-2)
|
Scaling Vision with Sparse Mixture of Experts
|
|
2021-06-10
|
|
DiNAT-Large (384x384; Pretrained on ImageNet-22K @ 224x224)
|
Dilated Neighborhood Attention Transformer
|
|
2022-09-29
|
|
data2vec 2.0
|
Efficient Self-supervised Learning with Contextua…
|
|
2022-12-14
|
|
UniRepLKNet-B++
|
UniRepLKNet: A Universal Perception Large-Kernel …
|
|
2023-11-27
|
|
HVT Huge
|
HVT: A Comprehensive Vision Framework for Learnin…
|
|
2024-09-25
|
|
CAFormer-B36 (224 res, 21K)
|
MetaFormer Baselines for Vision
|
|
2022-10-24
|
|
UniNet-B6
|
UniNet: Unified Architecture Search with Convolut…
|
|
2022-07-12
|
|
DiNAT_s-Large (384res; Pretrained on IN22K@224)
|
Dilated Neighborhood Attention Transformer
|
|
2022-09-29
|
|
Swin-L
|
Swin Transformer: Hierarchical Vision Transformer…
|
|
2021-03-25
|
|
EfficientNetV2-XL (21k)
|
EfficientNetV2: Smaller Models and Faster Training
|
|
2021-04-01
|
|
VOLO-D5+HAT
|
Improving Vision Transformers by Revisiting High-…
|
|
2022-04-03
|
|
EfficientNetV2 (PolyLoss)
|
PolyLoss: A Polynomial Expansion Perspective of C…
|
|
2022-04-26
|
|
ELSA-VOLO-D5 (512*512)
|
ELSA: Enhanced Local Self-Attention for Vision Tr…
|
|
2021-12-23
|
|
Swin-L@224 (cosub)
|
Co-training $2^L$ Submodels for Visual Recognition
|
|
2022-12-09
|
|
CoAtNet-2 (21k)
|
CoAtNet: Marrying Convolution and Attention for A…
|
|
2021-06-09
|
|
FixEfficientNet-B7
|
Fixing the train-test resolution discrepancy: Fix…
|
|
2020-03-18
|
|
FAN-L-Hybrid++
|
Understanding The Robustness in Vision Transforme…
|
|
2022-04-26
|
|
ColorMAE-Green-ViTB-1600
|
ColorMAE: Exploring data-independent masking stra…
|
|
2024-07-17
|
|
SwinV2-B
|
Swin Transformer V2: Scaling Up Capacity and Reso…
|
|
2021-11-18
|
|
VOLO-D5
|
VOLO: Vision Outlooker for Visual Recognition
|
|
2021-06-24
|
|
PatchConvNet-L120-21k-384
|
Augmenting Convolutional networks with attention-…
|
|
2021-12-27
|
|
16-TokenLearner B/16 (21)
|
TokenLearner: What Can 8 Learned Tokens Do for Im…
|
|
2021-06-21
|
|
MAE+DAT (ViT-H)
|
Enhance the Visual Representation via Discrete Ad…
|
|
2022-09-16
|
|
VAN-B5 (22K, 384res)
|
Visual Attention Network
|
|
2022-02-20
|
|
UniNet-B5
|
UniNet: Unified Architecture Search with Convolut…
|
|
2022-07-12
|
|
ConvFormer-B36 (224 res, 21K)
|
MetaFormer Baselines for Vision
|
|
2022-10-24
|
|
MAE (ViT-H)
|
Masked Autoencoders Are Scalable Vision Learners
|
|
2021-11-11
|
|
Hiera-H
|
Hiera: A Hierarchical Vision Transformer without …
|
|
2023-06-01
|
|
CAFormer-S36 (384 res, 21K)
|
MetaFormer Baselines for Vision
|
|
2022-10-24
|
|
ConvFormer-M36 (384 res, 21K)
|
MetaFormer Baselines for Vision
|
|
2022-10-24
|
|
NoisyStudent (EfficientNet-B7)
|
Self-training with Noisy Student improves ImageNe…
|
|
2019-11-11
|
|
DaViT-B (ImageNet-22k)
|
DaViT: Dual Attention Vision Transformers
|
|
2022-04-07
|
|
VAN-B6 (22K)
|
Visual Attention Network
|
|
2022-02-20
|
|
MAWS (ViT-B)
|
The effectiveness of MAE pre-pretraining for bill…
|
|
2023-03-23
|
|
EfficientNetV2-L (21k)
|
EfficientNetV2: Smaller Models and Faster Training
|
|
2021-04-01
|
|
CAIT-M36-448
|
Going deeper with Image Transformers
|
|
2021-03-31
|
|
VOLO-D4
|
VOLO: Vision Outlooker for Visual Recognition
|
|
2021-06-24
|
|
NFNet-F5 w/ SAM w/ augmult=16
|
Drawing Multiple Augmentation Samples Per Image D…
|
|
2021-05-27
|
|
µ2Net (ViT-L/16)
|
An Evolutionary Approach to Dynamic Introduction …
|
|
2022-05-25
|
|
ViT-B @384 (DeiT III, 21k)
|
DeiT III: Revenge of the ViT
|
|
2022-04-14
|
|
MaxViT-B (512res)
|
MaxViT: Multi-Axis Vision Transformer
|
|
2022-04-04
|
|
FixEfficientNet-B6
|
Fixing the train-test resolution discrepancy: Fix…
|
|
2020-03-18
|
|
MOAT-3 1K only
|
MOAT: Alternating Mobile Convolution and Attentio…
|
|
2022-10-04
|
|
Heinsen Routing + BEiT-large 16 224
|
An Algorithm for Routing Vectors in Sequences
|
|
2022-11-20
|
|
CLCNet (S:ViT+D:EffNet-B7) (retrain)
|
CLCNet: Rethinking of Ensemble Modeling with Clas…
|
|
2022-05-19
|
|
CAFormer-M36 (224 res, 21K)
|
MetaFormer Baselines for Vision
|
|
2022-10-24
|
|
VAN-B4 (22K, 384res)
|
Visual Attention Network
|
|
2022-02-20
|
|
data2vec (ViT-H)
|
data2vec: A General Framework for Self-supervised…
|
|
2022-02-07
|
|
DiNAT_s-Large (224x224; Pretrained on ImageNet-22K @ 224x224)
|
Dilated Neighborhood Attention Transformer
|
|
2022-09-29
|
|
MKD ViT-L
|
Meta Knowledge Distillation
|
|
2022-02-16
|
|
TinyViT-21M-512-distill (512 res, 21k)
|
TinyViT: Fast Pretraining Distillation for Small …
|
|
2022-07-21
|
|
PatchConvNet-B60-21k-384
|
Augmenting Convolutional networks with attention-…
|
|
2021-12-27
|
|
CaiT-M-48-448
|
Going deeper with Image Transformers
|
|
2021-03-31
|
|
NFNet-F6 w/ SAM
|
High-Performance Large-Scale Image Recognition Wi…
|
|
2021-02-11
|
|
CLCNet (S:ViT+D:VOLO-D3) (retrain)
|
CLCNet: Rethinking of Ensemble Modeling with Clas…
|
|
2022-05-19
|
|
CLCNet (S:ConvNeXt-L+D:EffNet-B7) (retrain)
|
CLCNet: Rethinking of Ensemble Modeling with Clas…
|
|
2022-05-19
|
|
MViTv2-L (384 res)
|
MViTv2: Improved Multiscale Vision Transformers f…
|
|
2021-12-02
|
|
SReT-S (384 res, ImageNet-1K only)
|
Sliced Recursive Transformer
|
|
2021-11-09
|
|
MaxViT-L (384res)
|
MaxViT: Multi-Axis Vision Transformer
|
|
2022-04-04
|
|
UniRepLKNet-S++
|
UniRepLKNet: A Universal Perception Large-Kernel …
|
|
2023-11-27
|
|
FixEfficientNet-B5
|
Fixing the train-test resolution discrepancy: Fix…
|
|
2020-03-18
|
|
ConvFormer-S36 (384 res, 21K)
|
MetaFormer Baselines for Vision
|
|
2022-10-24
|
|
NoisyStudent (EfficientNet-B6)
|
Self-training with Noisy Student improves ImageNe…
|
|
2019-11-11
|
|
Swin-B
|
Swin Transformer: Hierarchical Vision Transformer…
|
|
2021-03-25
|
|
CAFormer-B36 (384 res)
|
MetaFormer Baselines for Vision
|
|
2022-10-24
|
|
LV-ViT-L
|
All Tokens Matter: Token Labeling for Training Be…
|
|
2021-04-22
|
|
FixResNeXt-101 32x48d
|
Fixing the train-test resolution discrepancy
|
|
2019-06-14
|
|
HCGNet-B
|
Gated Convolutional Networks with Hybrid Connecti…
|
|
2019-08-26
|
|
MaxViT-B (384res)
|
MaxViT: Multi-Axis Vision Transformer
|
|
2022-04-04
|
|
ViT-B@224 (cosub)
|
Co-training $2^L$ Submodels for Visual Recognition
|
|
2022-12-09
|
|
Our SP-ViT-L|384
|
SP-ViT: Learning 2D Spatial Priors for Vision Tra…
|
|
2022-06-15
|
|
VOLO-D3
|
VOLO: Vision Outlooker for Visual Recognition
|
|
2021-06-24
|
|
BEiT-L (ViT; ImageNet 1k pretrain)
|
BEiT: BERT Pre-Training of Image Transformers
|
|
2021-06-15
|
|
VAN-B5 (22K)
|
Visual Attention Network
|
|
2022-02-20
|
|
UniFormer-L (384 res)
|
UniFormer: Unifying Convolution and Self-attentio…
|
|
2022-01-24
|
|
AdvProp (EfficientNet-B7)
|
Adversarial Examples Improve Image Recognition
|
|
2019-11-21
|
|
NFNet-F5 w/ SAM
|
High-Performance Large-Scale Image Recognition Wi…
|
|
2021-02-11
|
|
Swin-B@224 (cosub)
|
Co-training $2^L$ Submodels for Visual Recognition
|
|
2022-12-09
|
|
TinyViT-21M-384-distill (384 res, 21k)
|
TinyViT: Fast Pretraining Distillation for Small …
|
|
2022-07-21
|
|
EfficientNetV2-M (21k)
|
EfficientNetV2: Smaller Models and Faster Training
|
|
2021-04-01
|
|
CAFormer-M36 (384 res)
|
MetaFormer Baselines for Vision
|
|
2022-10-24
|
|
TransNeXt-Base (IN-1K supervised, 384)
|
TransNeXt: Robust Foveal Visual Perception for Vi…
|
|
2023-11-28
|
|
MIRL (ViT-B-48)
|
Masked Image Residual Learning for Scaling Deeper…
|
|
2023-09-25
|
|
MaxViT-S (512res)
|
MaxViT: Multi-Axis Vision Transformer
|
|
2022-04-04
|
|
CAIT-M-24
|
Going deeper with Image Transformers
|
|
2021-03-31
|
|
UniNet-B5
|
UniNet: Unified Architecture Search with Convolut…
|
|
2021-10-08
|
|
NoisyStudent (EfficientNet-B5)
|
Self-training with Noisy Student improves ImageNe…
|
|
2019-11-11
|
|
ConvFormer-M36 (224 res, 21K)
|
MetaFormer Baselines for Vision
|
|
2022-10-24
|
|
CAIT-M-36
|
Going deeper with Image Transformers
|
|
2021-03-31
|
|
GPaCo (ViT-L)
|
Generalized Parametric Contrastive Learning
|
|
2022-09-26
|
|
Omnivore (Swin-L)
|
Omnivore: A Single Model for Many Visual Modaliti…
|
|
2022-01-20
|
|
Our SP-ViT-M|384
|
SP-ViT: Learning 2D Spatial Priors for Vision Tra…
|
|
2022-06-15
|
|
TransNeXt-Small (IN-1K supervised, 384)
|
TransNeXt: Robust Foveal Visual Perception for Vi…
|
|
2023-11-28
|
|
VOLO-D2
|
VOLO: Vision Outlooker for Visual Recognition
|
|
2021-06-24
|
|
EfficientViT-L2 (r384)
|
EfficientViT: Multi-Scale Linear Attention for Hi…
|
|
2022-05-29
|
|
XCiT-L24
|
XCiT: Cross-Covariance Image Transformers
|
|
2021-06-17
|
|
SparK (ConvNeXt-Large, 384)
|
Designing BERT for Convolutional Networks: Sparse…
|
|
2023-01-09
|
|
NFNet-F5
|
High-Performance Large-Scale Image Recognition Wi…
|
|
2021-02-11
|
|
MAE (ViT-L)
|
Masked Autoencoders Are Scalable Vision Learners
|
|
2021-11-11
|
|
FixEfficientNet-B4
|
Fixing the train-test resolution discrepancy: Fix…
|
|
2020-03-18
|
|
DAT-B++ (384x384)
|
DAT++: Spatially Dynamic Vision Transformer with …
|
|
2023-09-04
|
|
NFNet-F4
|
High-Performance Large-Scale Image Recognition Wi…
|
|
2021-02-11
|
|
ConvNeXt-B@224 (cosub)
|
Co-training $2^L$ Submodels for Visual Recognition
|
|
2022-12-09
|
|
PiT-B@224 (cosub)
|
Co-training $2^L$ Submodels for Visual Recognition
|
|
2022-12-09
|
|
GTP-ViT-B-Patch8/P20
|
GTP-ViT: Efficient Vision Transformers via Graph-…
|
|
2023-11-06
|
|
CAFormer-S36 (224 res, 21K)
|
MetaFormer Baselines for Vision
|
|
2022-10-24
|
|
XCiT-M24
|
XCiT: Cross-Covariance Image Transformers
|
|
2021-06-17
|
|
Fix-EfficientNet-B8 (MaxUp + CutMix)
|
MaxUp: A Simple Way to Improve Generalization of …
|
|
2020-02-20
|
|
KDforAA (EfficientNet-B8)
|
Circumventing Outliers of AutoAugment with Knowle…
|
|
2020-03-25
|
|
ViT-L
|
DeiT III: Revenge of the ViT
|
|
2022-04-14
|
|
FasterViT-6
|
FasterViT: Fast Vision Transformers with Hierarch…
|
|
2023-06-09
|
|
SEER (RG-10B)
|
Vision Models Are More Robust And Fair When Pretr…
|
|
2022-02-16
|
|
MaxViT-T (384res)
|
MaxViT: Multi-Axis Vision Transformer
|
|
2022-04-04
|
|
VAN-B4 (22K)
|
Visual Attention Network
|
|
2022-02-20
|
|
EfficientNetV2-L
|
EfficientNetV2: Smaller Models and Faster Training
|
|
2021-04-01
|
|
FixEfficientNet-B8
|
Fixing the train-test resolution discrepancy: Fix…
|
|
2020-03-18
|
|
UniFormer-L
|
UniFormer: Unifying Convolution and Self-attentio…
|
|
2022-01-24
|
|
SCARLET-C
|
SCARLET-NAS: Bridging the Gap between Stability a…
|
|
2019-08-16
|
|
ViT-B @224 (DeiT III, 21k)
|
DeiT III: Revenge of the ViT
|
|
2022-04-14
|
|
dBOT ViT-B (CLIP as Teacher)
|
Exploring Target Representations for Masked Autoe…
|
|
2022-09-08
|
|
CAFormer-S36 (384 res)
|
MetaFormer Baselines for Vision
|
|
2022-10-24
|
|
ConvFormer-B36 (384 res)
|
MetaFormer Baselines for Vision
|
|
2022-10-24
|
|
NFNet-F3
|
High-Performance Large-Scale Image Recognition Wi…
|
|
2021-02-11
|
|
ViT-H @224 (DeiT-III + AugSub)
|
Masking meets Supervision: A Strong Learning Alli…
|
|
2023-06-20
|
|
XCiT-S24
|
XCiT: Cross-Covariance Image Transformers
|
|
2021-06-17
|
|
ConvFormer-M36 (384 res)
|
MetaFormer Baselines for Vision
|
|
2022-10-24
|
|
EfficientViT-L2 (r288)
|
EfficientViT: Multi-Scale Linear Attention for Hi…
|
|
2022-05-29
|
|
ViT-L@384 (attn finetune)
|
Three things everyone should know about Vision Tr…
|
|
2022-03-18
|
|
Our SP-ViT-L
|
SP-ViT: Learning 2D Spatial Priors for Vision Tra…
|
|
2022-06-15
|
|
Mini-Swin-B@384
|
MiniViT: Compressing Vision Transformers with Wei…
|
|
2022-04-14
|
|
Wave-ViT-L
|
Wave-ViT: Unifying Wavelet and Transformers for V…
|
|
2022-07-11
|
|
KDforAA (EfficientNet-B7)
|
Circumventing Outliers of AutoAugment with Knowle…
|
|
2020-03-25
|
|
HaloNet4 (base 128, Conv-12)
|
Scaling Local Self-Attention for Parameter Effici…
|
|
2021-03-23
|
|
AdvProp (EfficientNet-B8)
|
Adversarial Examples Improve Image Recognition
|
|
2019-11-21
|
|
CAFormer-B36 (224 res)
|
MetaFormer Baselines for Vision
|
|
2022-10-24
|
|
ConvNeXt-L (384 res)
|
A ConvNet for the 2020s
|
|
2022-01-10
|
|
EfficientNet-B8 (RandAugment)
|
RandAugment: Practical automated data augmentatio…
|
|
2019-09-30
|
|
BiFormer-B* (IN1k ptretrain)
|
BiFormer: Vision Transformer with Bi-Level Routin…
|
|
2023-03-15
|
|
GTP-EVA-L/P8
|
GTP-ViT: Efficient Vision Transformers via Graph-…
|
|
2023-11-06
|
|
PatchConvNet-S60-21k-512
|
Augmenting Convolutional networks with attention-…
|
|
2021-12-27
|
|
AlphaNet-A0
|
AlphaNet: Improved Training of Supernets with Alp…
|
|
2021-02-16
|
|
CAFormer-S18 (384 res, 21K)
|
MetaFormer Baselines for Vision
|
|
2022-10-24
|
|
ConvFormer-S36 (224 res, 21K)
|
MetaFormer Baselines for Vision
|
|
2022-10-24
|
|
ConvFormer-S36 (384 res)
|
MetaFormer Baselines for Vision
|
|
2022-10-24
|
|
CAIT-S-36
|
Going deeper with Image Transformers
|
|
2021-03-31
|
|
FasterViT-4
|
FasterViT: Fast Vision Transformers with Hierarch…
|
|
2023-06-09
|
|
ResNeXt-101 32x48d
|
Exploring the Limits of Weakly Supervised Pretrai…
|
|
2018-05-02
|
|
BiT-M (ResNet)
|
Big Transfer (BiT): General Visual Representation…
|
|
2019-12-24
|
|
ViT-L/16 Dosovitskiy et al. (2021)
|
MLP-Mixer: An all-MLP Architecture for Vision
|
|
2021-05-04
|
|
Omnivore (Swin-B)
|
Omnivore: A Single Model for Many Visual Modaliti…
|
|
2022-01-20
|
|
NFNet-F2
|
High-Performance Large-Scale Image Recognition Wi…
|
|
2021-02-11
|
|
ScaleNet-50
|
Data-Driven Neuron Allocation for Scale Aggregati…
|
|
2019-04-20
|
|
NoisyStudent (EfficientNet-B4)
|
Self-training with Noisy Student improves ImageNe…
|
|
2019-11-11
|
|
CAIT-S-48
|
Going deeper with Image Transformers
|
|
2021-03-31
|
|
ViT-L @224 (DeiT-III + AugSub)
|
Masking meets Supervision: A Strong Learning Alli…
|
|
2023-06-20
|
|
CLCNet (S:D1+D:D5)
|
CLCNet: Rethinking of Ensemble Modeling with Clas…
|
|
2022-05-19
|
|
ViT-H @224 (DeiT III)
|
DeiT III: Revenge of the ViT
|
|
2022-04-14
|
|
HyenaPixel-Bidirectional-Former-B36
|
HyenaPixel: Global Image Context with Convolutions
|
|
2024-02-29
|
|
VOLO-D1
|
VOLO: Vision Outlooker for Visual Recognition
|
|
2021-06-24
|
|
CAFormer-M36 (224 res)
|
MetaFormer Baselines for Vision
|
|
2022-10-24
|
|
ResNeXt-101 32x32d
|
Exploring the Limits of Weakly Supervised Pretrai…
|
|
2018-05-02
|
|
DeiT-B 384
|
Training data-efficient image transformers & dist…
|
|
2020-12-23
|
|
MaxViT-L (224res)
|
MaxViT: Multi-Axis Vision Transformer
|
|
2022-04-04
|
|
EfficientNetV2-M
|
EfficientNetV2: Smaller Models and Faster Training
|
|
2021-04-01
|
|
MKD ViT-B
|
Meta Knowledge Distillation
|
|
2022-02-16
|
|
SP-ViT-S|384
|
SP-ViT: Learning 2D Spatial Priors for Vision Tra…
|
|
2022-06-15
|
|
XCiT-S12
|
XCiT: Cross-Covariance Image Transformers
|
|
2021-06-17
|
|
CAIT-S-24
|
Going deeper with Image Transformers
|
|
2021-03-31
|
|
ResNet200_vd_26w_4s_ssld
|
Semi-Supervised Recognition under a Noisy and Fin…
|
|
2020-06-18
|
|
MixMIM-B
|
MixMAE: Mixed and Masked Autoencoder for Efficien…
|
|
2022-05-26
|
|
CAFormer-S18 (384 res)
|
MetaFormer Baselines for Vision
|
|
2022-10-24
|
|
ConvFormer-S18 (384 res, 21K)
|
MetaFormer Baselines for Vision
|
|
2022-10-24
|
|
EfficientNet-B7 (RandAugment)
|
RandAugment: Practical automated data augmentatio…
|
|
2019-09-30
|
|
ViT-B @384 (DeiT III)
|
DeiT III: Revenge of the ViT
|
|
2022-04-14
|
|
MambaVision-L
|
MambaVision: A Hybrid Mamba-Transformer Vision Ba…
|
|
2024-07-10
|
|
MaxViT-B (224res)
|
MaxViT: Multi-Axis Vision Transformer
|
|
2022-04-04
|
|
CaiT-S24
|
Which Transformer to Favor: A Comparative Analysi…
|
|
2023-08-18
|
|
CAIT-XS-36
|
Going deeper with Image Transformers
|
|
2021-03-31
|
|
ViT-L @224 (DeiT III)
|
DeiT III: Revenge of the ViT
|
|
2022-04-14
|
|
Our SP-ViT-M
|
SP-ViT: Learning 2D Spatial Priors for Vision Tra…
|
|
2022-06-15
|
|
FastViT-MA36
|
FastViT: A Fast Hybrid Vision Transformer using S…
|
|
2023-03-24
|
|
HyenaPixel-Former-B36
|
HyenaPixel: Global Image Context with Convolutions
|
|
2024-02-29
|
|
EfficientNetV2-S (21k)
|
EfficientNetV2: Smaller Models and Faster Training
|
|
2021-04-01
|
|
CvT-21 (384 res, ImageNet-22k pretrain)
|
CvT: Introducing Convolutions to Vision Transform…
|
|
2021-03-29
|
|
DAT-B++ (224x224)
|
DAT++: Spatially Dynamic Vision Transformer with …
|
|
2023-09-04
|
|
InternImage-B
|
InternImage: Exploring Large-Scale Vision Foundat…
|
|
2022-11-10
|
|
FasterViT-3
|
FasterViT: Fast Vision Transformers with Hierarch…
|
|
2023-06-09
|
|
TinyViT-21M-distill (21k)
|
TinyViT: Fast Pretraining Distillation for Small …
|
|
2022-07-21
|
|
Wave-ViT-B
|
Wave-ViT: Unifying Wavelet and Transformers for V…
|
|
2022-07-11
|
|
SReT-B (384 res, ImageNet-1K only)
|
Sliced Recursive Transformer
|
|
2021-11-09
|
|
MViT-B-24
|
Multiscale Vision Transformers
|
|
2021-04-22
|
|
ActiveMLP-L
|
Active Token Mixer
|
|
2022-03-11
|
|
DAT-B (384 res, IN-1K only)
|
Vision Transformer with Deformable Attention
|
|
2022-01-03
|
|
MIRL(ViT-S-54)
|
Masked Image Residual Learning for Scaling Deeper…
|
|
2023-09-25
|
|
ConvFormer-B36 (224 res)
|
MetaFormer Baselines for Vision
|
|
2022-10-24
|
|
RDNet-L
|
DenseNets Reloaded: Paradigm Shift Beyond ResNets…
|
|
2024-03-28
|
|
ResNeXt-101 32x16d (semi-weakly sup.)
|
Billion-scale semi-supervised learning for image …
|
|
2019-05-02
|
|
ELSA-VOLO-D1
|
ELSA: Enhanced Local Self-Attention for Vision Tr…
|
|
2021-12-23
|
|
TransNeXt-Small (IN-1K supervised, 224)
|
TransNeXt: Robust Foveal Visual Perception for Vi…
|
|
2023-11-28
|
|
Next-ViT-L @384
|
Next-ViT: Next Generation Vision Transformer for …
|
|
2022-07-12
|
|
VVT-L (384 res)
|
Vicinity Vision Transformer
|
|
2022-06-21
|
|
BoTNet T7
|
Bottleneck Transformers for Visual Recognition
|
|
2021-01-27
|
|
FixEfficientNetB4
|
Fixing the train-test resolution discrepancy: Fix…
|
|
2020-03-18
|
|
MogaNet-L
|
MogaNet: Multi-order Gated Aggregation Network
|
|
2022-11-07
|
|
LITv2-B|384
|
Fast Vision Transformers with HiLo Attention
|
|
2022-05-26
|
|
NFNet-F1
|
High-Performance Large-Scale Image Recognition Wi…
|
|
2021-02-11
|
|
DAT-S++
|
DAT++: Spatially Dynamic Vision Transformer with …
|
|
2023-09-04
|
|
Sequencer2D-L↑392
|
Sequencer: Deep LSTM for Image Classification
|
|
2022-05-04
|
|
SE-CoTNetD-152
|
Contextual Transformer Networks for Visual Recogn…
|
|
2021-07-26
|
|
AMD(ViT-B/16)
|
Asymmetric Masked Distillation for Pre-Training S…
|
|
2023-11-06
|
|
DaViT-B
|
DaViT: Dual Attention Vision Transformers
|
|
2022-04-07
|
|
FastViT-SA36
|
FastViT: A Fast Hybrid Vision Transformer using S…
|
|
2023-03-24
|
|
ReXNet-R_3.0
|
Rethinking Channel Dimensions for Efficient Model…
|
|
2020-07-02
|
|
CAFormer-S36 (224 res)
|
MetaFormer Baselines for Vision
|
|
2022-10-24
|
|
EfficientViT-L1 (r224)
|
EfficientViT: Multi-Scale Linear Attention for Hi…
|
|
2022-05-29
|
|
ConvFormer-M36 (224 res)
|
MetaFormer Baselines for Vision
|
|
2022-10-24
|
|
GC ViT-B
|
Global Context Vision Transformers
|
|
2022-06-20
|
|
ResNeSt-269
|
ResNeSt: Split-Attention Networks
|
|
2020-04-19
|
|
CoAtNet-3
|
CoAtNet: Marrying Convolution and Attention for A…
|
|
2021-06-09
|
|
MaxViT-S (224res)
|
MaxViT: Multi-Axis Vision Transformer
|
|
2022-04-04
|
|
GPIPE
|
GPipe: Efficient Training of Giant Neural Network…
|
|
2018-11-16
|
|
DeBiFormer-B
|
DeBiFormer: Vision Transformer with Deformable Ag…
|
|
2024-10-11
|
|
ConvFormer-S18 (384 res)
|
MetaFormer Baselines for Vision
|
|
2022-10-24
|
|
EfficientNet-B7
|
EfficientNet: Rethinking Model Scaling for Convol…
|
|
2019-05-28
|
|
RDNet-B
|
DenseNets Reloaded: Paradigm Shift Beyond ResNets…
|
|
2024-03-28
|
|
DiNAT-Base
|
Dilated Neighborhood Attention Transformer
|
|
2022-09-29
|
|
ResNet-RS-50 (160 image res)
|
Revisiting ResNets: Improved Training and Scaling…
|
|
2021-03-13
|
|
ColorNet (RHYLH with Conv Layer)
|
ColorNet: Investigating the importance of color s…
|
|
2019-02-01
|
|
ViT-B@384 (attn finetune)
|
Three things everyone should know about Vision Tr…
|
|
2022-03-18
|
|
BiFormer-S* (IN1k ptretrain)
|
BiFormer: Vision Transformer with Bi-Level Routin…
|
|
2023-03-15
|
|
SReT-S (512 res, ImageNet-1K only)
|
Sliced Recursive Transformer
|
|
2021-11-09
|
|
LambdaResNet200
|
LambdaNetworks: Modeling Long-Range Interactions …
|
|
2021-02-17
|
|
Fix_ResNet50_vd_ssld
|
Semi-Supervised Recognition under a Noisy and Fin…
|
|
2020-06-18
|
|
MogaNet-B
|
MogaNet: Multi-order Gated Aggregation Network
|
|
2022-11-07
|
|
TResNet-XL
|
TResNet: High Performance GPU-Dedicated Architect…
|
|
2020-03-30
|
|
ResNeXt-101 32x8d (semi-weakly sup.)
|
Billion-scale semi-supervised learning for image …
|
|
2019-05-02
|
|
NAT-Base
|
Neighborhood Attention Transformer
|
|
2022-04-14
|
|
Assemble-ResNet152
|
Compounding the Performance Improvements of Assem…
|
|
2020-01-17
|
|
BoTNet T7-320
|
Bottleneck Transformers for Visual Recognition
|
|
2021-01-27
|
|
ViP-B|384
|
Visual Parser: Representing Part-whole Hierarchie…
|
|
2021-07-13
|
|
RegnetY16GF@224 (cosub)
|
Co-training $2^L$ Submodels for Visual Recognition
|
|
2022-12-09
|
|
EfficientViT-B3 (r288)
|
EfficientViT: Multi-Scale Linear Attention for Hi…
|
|
2022-05-29
|
|
InternImage-S
|
InternImage: Exploring Large-Scale Vision Foundat…
|
|
2022-11-10
|
|
UniNet-B4
|
UniNet: Unified Architecture Search with Convolut…
|
|
2021-10-08
|
|
FasterViT-2
|
FasterViT: Fast Vision Transformers with Hierarch…
|
|
2023-06-09
|
|
TransNeXt-Tiny (IN-1K supervised, 224)
|
TransNeXt: Robust Foveal Visual Perception for Vi…
|
|
2023-11-28
|
|
DeiT-B
|
Training data-efficient image transformers & dist…
|
|
2020-12-23
|
|
ViT-B @224 (DeiT-III + AugSub)
|
Masking meets Supervision: A Strong Learning Alli…
|
|
2023-06-20
|
|
MambaVision-B
|
MambaVision: A Hybrid Mamba-Transformer Vision Ba…
|
|
2024-07-10
|
|
RevBiFPN-S6
|
RevBiFPN: The Fully Reversible Bidirectional Feat…
|
|
2022-06-28
|
|
ResNeXt-101 32×16d
|
Exploring the Limits of Weakly Supervised Pretrai…
|
|
2018-05-02
|
|
FBNetV5-F-CLS
|
FBNetV5: Neural Architecture Search for Multiple …
|
|
2021-11-19
|
|
ViT-B-36x1
|
Three things everyone should know about Vision Tr…
|
|
2022-03-18
|
|
ViT-B-18x2
|
Three things everyone should know about Vision Tr…
|
|
2022-03-18
|
|
XCiT-M (+MixPro)
|
MixPro: Data Augmentation with MaskMix and Progre…
|
|
2023-04-24
|
|
DGMMC-S
|
Performance of Gaussian Mixture Model Classifiers…
|
|
2024-10-17
|
|
NoisyStudent (EfficientNet-B3)
|
Self-training with Noisy Student improves ImageNe…
|
|
2019-11-11
|
|
CAS-ViT-T
|
CAS-ViT: Convolutional Additive Self-attention Vi…
|
|
2024-08-07
|
|
CAFormer-S18 (224 res, 21K)
|
MetaFormer Baselines for Vision
|
|
2022-10-24
|
|
CAIT-XS-24
|
Going deeper with Image Transformers
|
|
2021-03-31
|
|
ConvFormer-S36 (224 res)
|
MetaFormer Baselines for Vision
|
|
2022-10-24
|
|
LV-ViT-M
|
All Tokens Matter: Token Labeling for Training Be…
|
|
2021-04-22
|
|
VVT-L (224 res)
|
Vicinity Vision Transformer
|
|
2022-06-21
|
|
CoAtNet-2
|
CoAtNet: Marrying Convolution and Attention for A…
|
|
2021-06-09
|
|
Conformer-B
|
Conformer: Local Features Coupling Global Represe…
|
|
2021-05-09
|
|
PatchConvNet-B120
|
Augmenting Convolutional networks with attention-…
|
|
2021-12-27
|
|
GPaCo (Vit-B)
|
Generalized Parametric Contrastive Learning
|
|
2022-09-26
|
|
LambdaResNet152
|
LambdaNetworks: Modeling Long-Range Interactions …
|
|
2021-02-17
|
|
EfficientNet-B6
|
EfficientNet: Rethinking Model Scaling for Convol…
|
|
2019-05-28
|
|
GC ViT-S
|
Global Context Vision Transformers
|
|
2022-06-20
|
|
BoTNet T6
|
Bottleneck Transformers for Visual Recognition
|
|
2021-01-27
|
|
PiT-B
|
Rethinking Spatial Dimensions of Vision Transform…
|
|
2021-03-30
|
|
DeepMAD-89M
|
DeepMAD: Mathematical Architecture Design for Dee…
|
|
2023-03-05
|
|
EfficientNetV2-S
|
EfficientNetV2: Smaller Models and Faster Training
|
|
2021-04-01
|
|
Our SP-ViT-S
|
SP-ViT: Learning 2D Spatial Priors for Vision Tra…
|
|
2022-06-15
|
|
UniRepLKNet-S
|
UniRepLKNet: A Universal Perception Large-Kernel …
|
|
2023-11-27
|
|
DeBiFormer-S
|
DeBiFormer: Vision Transformer with Deformable Ag…
|
|
2024-10-11
|
|
Wave-ViT-S
|
Wave-ViT: Unifying Wavelet and Transformers for V…
|
|
2022-07-11
|
|
TNT-B
|
Transformer in Transformer
|
|
2021-02-27
|
|
ResNeSt-200
|
ResNeSt: Split-Attention Networks
|
|
2020-04-19
|
|
AmoebaNet-A
|
Regularized Evolution for Image Classifier Archit…
|
|
2018-02-05
|
|
CLCNet (S:B4+D:B7)
|
CLCNet: Rethinking of Ensemble Modeling with Clas…
|
|
2022-05-19
|
|
ResNet-RS-270 (256 image res)
|
Revisiting ResNets: Improved Training and Scaling…
|
|
2021-03-13
|
|
SENet-350
|
Bottleneck Transformers for Visual Recognition
|
|
2021-01-27
|
|
ViT-B @224 (DeiT III)
|
DeiT III: Revenge of the ViT
|
|
2022-04-14
|
|
DiNAT-Small
|
Dilated Neighborhood Attention Transformer
|
|
2022-09-29
|
|
Transformer local-attention (NesT-B)
|
Nested Hierarchical Transformer: Towards Accurate…
|
|
2021-05-26
|
|
PVTv2-B4
|
PVT v2: Improved Baselines with Pyramid Vision Tr…
|
|
2021-06-25
|
|
CA-Swin-S (+MixPro)
|
MixPro: Data Augmentation with MaskMix and Progre…
|
|
2023-04-24
|
|
GTP-ViT-L/P8
|
GTP-ViT: Efficient Vision Transformers via Graph-…
|
|
2023-11-06
|
|
ConvFormer-S18 (224 res, 21K)
|
MetaFormer Baselines for Vision
|
|
2022-10-24
|
|
RDNet-S
|
DenseNets Reloaded: Paradigm Shift Beyond ResNets…
|
|
2024-03-28
|
|
DAT-S
|
Vision Transformer with Deformable Attention
|
|
2022-01-03
|
|
NAT-Small
|
Neighborhood Attention Transformer
|
|
2022-04-14
|
|
QnA-ViT-Base
|
Learned Queries for Efficient Local Attention
|
|
2021-12-21
|
|
RevBiFPN-S5
|
RevBiFPN: The Fully Reversible Bidirectional Feat…
|
|
2022-06-28
|
|
Pyramid ViG-B
|
Vision GNN: An Image is Worth Graph of Nodes
|
|
2022-06-01
|
|
Container Container
|
Container: Context Aggregation Network
|
|
2021-06-02
|
|
UniNet-B2
|
UniNet: Unified Architecture Search with Convolut…
|
|
2021-10-08
|
|
Twins-SVT-L
|
Twins: Revisiting the Design of Spatial Attention…
|
|
2021-04-28
|
|
TransBoost-ViT-S
|
TransBoost: Improving the Best ImageNet Performan…
|
|
2022-05-26
|
|
XCiT-S
|
Which Transformer to Favor: A Comparative Analysi…
|
|
2023-08-18
|
|
MaxViT-T (224res)
|
MaxViT: Multi-Axis Vision Transformer
|
|
2022-04-04
|
|
Wave-ViT-S
|
Which Transformer to Favor: A Comparative Analysi…
|
|
2023-08-18
|
|
LITv2-B
|
Fast Vision Transformers with HiLo Attention
|
|
2022-05-26
|
|
MultiGrain PNASNet (500px)
|
MultiGrain: a unified image embedding for classes…
|
|
2019-02-14
|
|
MAE (ViT-L)
|
Masked Autoencoders Are Scalable Vision Learners
|
|
2021-11-11
|
|
PAT-B
|
Pattern Attention Transformer with Doughnut Kernel
|
|
2022-11-30
|
|
HyenaPixel-Attention-Former-S18
|
HyenaPixel: Global Image Context with Convolutions
|
|
2024-02-29
|
|
FixEfficientNet-B2
|
Fixing the train-test resolution discrepancy: Fix…
|
|
2020-03-18
|
|
CAFormer-S18 (224 res)
|
MetaFormer Baselines for Vision
|
|
2022-10-24
|
|
IPT-B
|
IncepFormer: Efficient Inception Transformer with…
|
|
2022-12-06
|
|
ViTAE-B-Stage
|
ViTAE: Vision Transformer Advanced by Exploring I…
|
|
2021-06-07
|
|
ResT-Large
|
ResT: An Efficient Transformer for Visual Recogni…
|
|
2021-05-28
|
|
AutoFormer-base
|
AutoFormer: Searching Transformers for Visual Rec…
|
|
2021-07-01
|
|
NFNet-F0
|
High-Performance Large-Scale Image Recognition Wi…
|
|
2021-02-11
|
|
SE-ResNeXt-101, 64x4d, S=2(320px)
|
Towards Better Accuracy-efficiency Trade-offs: Di…
|
|
2020-11-30
|
|
ResMLP-B24/8
|
ResMLP: Feedforward networks for image classifica…
|
|
2021-05-07
|
|
EfficientViT-B3 (r224)
|
EfficientViT: Multi-Scale Linear Attention for Hi…
|
|
2022-05-29
|
|
BoTNet T5
|
Bottleneck Transformers for Visual Recognition
|
|
2021-01-27
|
|
HyenaPixel-Bidirectional-Former-S18
|
HyenaPixel: Global Image Context with Convolutions
|
|
2024-02-29
|
|
PatchConvNet-B60
|
Augmenting Convolutional networks with attention-…
|
|
2021-12-27
|
|
SigLIP B/16 + PrefixedIter Decoder
|
Unconstrained Open Vocabulary Image Classificatio…
|
|
2024-07-15
|
|
ViT-B (hMLP + BeiT)
|
Three things everyone should know about Vision Tr…
|
|
2022-03-18
|
|
MultiGrain R50-AA-500
|
MultiGrain: a unified image embedding for classes…
|
|
2019-02-14
|
|
MNv4-Hybrid-L
|
MobileNetV4 -- Universal Models for the Mobile Ec…
|
|
2024-04-16
|
|
UniFormer-S
|
UniFormer: Unifying Convolution and Self-attentio…
|
|
2022-01-24
|
|
ViT-S @384 (DeiT III)
|
DeiT III: Revenge of the ViT
|
|
2022-04-14
|
|
MogaNet-S
|
MogaNet: Multi-order Gated Aggregation Network
|
|
2022-11-07
|
|
GC ViT-T
|
Global Context Vision Transformers
|
|
2022-06-20
|
|
ResNeXt-101 32x4d (semi-weakly sup.)
|
Billion-scale semi-supervised learning for image …
|
|
2019-05-02
|
|
Sequencer2D-L
|
Sequencer: Deep LSTM for Image Classification
|
|
2022-05-04
|
|
sMLPNet-B (ImageNet-1k)
|
Sparse MLP for Image Recognition: Is Self-Attenti…
|
|
2021-09-12
|
|
SE-ResNeXt-101, 64x4d, S=2(416px)
|
Towards Better Accuracy-efficiency Trade-offs: Di…
|
|
2020-11-30
|
|
ResNet-50 (Adversarial Autoaugment)
|
Adversarial AutoAugment
|
|
2019-12-24
|
|
TinyViT-5M
|
TinyViT: Fast Pretraining Distillation for Small …
|
|
2022-07-21
|
|
CvT-21 (384 res)
|
CvT: Introducing Convolutions to Vision Transform…
|
|
2021-03-29
|
|
T2T-ViT-14|384
|
Tokens-to-Token ViT: Training Vision Transformers…
|
|
2021-01-28
|
|
CeiT-S (384 finetune res)
|
Incorporating Convolution Designs into Visual Tra…
|
|
2021-03-22
|
|
LV-ViT-S
|
All Tokens Matter: Token Labeling for Training Be…
|
|
2021-04-22
|
|
MOAT-0 1K only
|
MOAT: Alternating Mobile Convolution and Attentio…
|
|
2022-10-04
|
|
EfficientNet-B5
|
EfficientNet: Rethinking Model Scaling for Convol…
|
|
2019-05-28
|
|
Transformer local-attention (NesT-S)
|
Nested Hierarchical Transformer: Towards Accurate…
|
|
2021-05-26
|
|
ViL-Medium-D
|
Multi-Scale Vision Longformer: A New Vision Trans…
|
|
2021-03-29
|
|
SE-CoTNetD-101
|
Contextual Transformer Networks for Visual Recogn…
|
|
2021-07-26
|
|
Next-ViT-B
|
Next-ViT: Next Generation Vision Transformer for …
|
|
2022-07-12
|
|
CoAtNet-1
|
CoAtNet: Marrying Convolution and Attention for A…
|
|
2021-06-09
|
|
LITv2-M
|
Fast Vision Transformers with HiLo Attention
|
|
2022-05-26
|
|
MambaVision-S
|
MambaVision: A Hybrid Mamba-Transformer Vision Ba…
|
|
2024-07-10
|
|
Shift-B
|
When Shift Operation Meets Vision Transformer: An…
|
|
2022-01-26
|
|
MultiGrain PNASNet (450px)
|
MultiGrain: a unified image embedding for classes…
|
|
2019-02-14
|
|
Meta Pseudo Labels (ResNet-50)
|
Meta Pseudo Labels
|
|
2020-03-23
|
|
UniRepLKNet-T
|
UniRepLKNet: A Universal Perception Large-Kernel …
|
|
2023-11-27
|
|
HyenaPixel-Former-S18
|
HyenaPixel: Global Image Context with Convolutions
|
|
2024-02-29
|
|
TinyViT-11M-distill (21k)
|
TinyViT: Fast Pretraining Distillation for Small …
|
|
2022-07-21
|
|
ReXNet-R_2.0
|
Rethinking Channel Dimensions for Efficient Model…
|
|
2020-07-02
|
|
QnA-ViT-Small
|
Learned Queries for Efficient Local Attention
|
|
2021-12-21
|
|
NAT-Tiny
|
Neighborhood Attention Transformer
|
|
2022-04-14
|
|
PVTv2-B3
|
PVT v2: Improved Baselines with Pyramid Vision Tr…
|
|
2021-06-25
|
|
PatchConvNet-S120
|
Augmenting Convolutional networks with attention-…
|
|
2021-12-27
|
|
FasterViT-1
|
FasterViT: Fast Vision Transformers with Hierarch…
|
|
2023-06-09
|
|
ViL-Base-D
|
Multi-Scale Vision Longformer: A New Vision Trans…
|
|
2021-03-29
|
|
CycleMLP-B5
|
CycleMLP: A MLP-like Architecture for Dense Predi…
|
|
2021-07-21
|
|
MultiGrain SENet154 (450px)
|
MultiGrain: a unified image embedding for classes…
|
|
2019-02-14
|
|
DeepVit-L* (DeiT training recipe)
|
DeepViT: Towards Deeper Vision Transformer
|
|
2021-03-22
|
|
ViT-S @224 (DeiT III, 21k)
|
DeiT III: Revenge of the ViT
|
|
2022-04-14
|
|
MKD ViT-S
|
Meta Knowledge Distillation
|
|
2022-02-16
|
|
ViT-S@224 (cosub)
|
Co-training $2^L$ Submodels for Visual Recognition
|
|
2022-12-09
|
|
PAT-S
|
Pattern Attention Transformer with Doughnut Kernel
|
|
2022-11-30
|
|
TinyViT-21M
|
TinyViT: Fast Pretraining Distillation for Small …
|
|
2022-07-21
|
|
sMLPNet-S (ImageNet-1k)
|
Sparse MLP for Image Recognition: Is Self-Attenti…
|
|
2021-09-12
|
|
RevBiFPN-S4
|
RevBiFPN: The Fully Reversible Bidirectional Feat…
|
|
2022-06-28
|
|
ZenNAS (0.8ms)
|
Zen-NAS: A Zero-Shot NAS for High-Performance Dee…
|
|
2021-02-01
|
|
Pyramid ViG-M
|
Vision GNN: An Image is Worth Graph of Nodes
|
|
2022-06-01
|
|
SwinV2-Ti
|
Which Transformer to Favor: A Comparative Analysi…
|
|
2023-08-18
|
|
gSwin-S
|
gSwin: Gated MLP Vision Model with Hierarchical S…
|
|
2022-08-24
|
|
MultiGrain SENet154 (400px)
|
MultiGrain: a unified image embedding for classes…
|
|
2019-02-14
|
|
Swin-S + GFSA
|
Graph Convolutions Enrich the Self-Attention in T…
|
|
2023-12-07
|
|
CAS-ViT-M
|
CAS-ViT: Convolutional Additive Self-attention Vi…
|
|
2024-08-07
|
|
CvT-13 (384 res)
|
CvT: Introducing Convolutions to Vision Transform…
|
|
2021-03-29
|
|
ResNet50_vd_ssld
|
Semi-Supervised Recognition under a Noisy and Fin…
|
|
2020-06-18
|
|
ConvFormer-S18 (224 res)
|
MetaFormer Baselines for Vision
|
|
2022-10-24
|
|
MViT-B-16
|
Multiscale Vision Transformers
|
|
2021-04-22
|
|
ResNeSt-101
|
ResNeSt: Split-Attention Networks
|
|
2020-04-19
|
|
DeiT-B (+MixPro)
|
MixPro: Data Augmentation with MaskMix and Progre…
|
|
2023-04-24
|
|
MNv4-Conv-L
|
MobileNetV4 -- Universal Models for the Mobile Ec…
|
|
2024-04-16
|
|
IPT-S
|
IncepFormer: Efficient Inception Transformer with…
|
|
2022-12-06
|
|
ViL-Medium-W
|
Multi-Scale Vision Longformer: A New Vision Trans…
|
|
2021-03-29
|
|
GFNet-H-B
|
Global Filter Networks for Image Classification
|
|
2021-07-01
|
|
Oct-ResNet-152 (SE)
|
Drop an Octave: Reducing Spatial Redundancy in Co…
|
|
2019-04-10
|
|
Harm-SE-RNX-101 64x4d (320x320, Mean-Max Pooling)
|
Harmonic Convolutional Networks based on Discrete…
|
|
2020-01-18
|
|
GTP-LV-ViT-M/P8
|
GTP-ViT: Efficient Vision Transformers via Graph-…
|
|
2023-11-06
|
|
FunMatch - T384+224 (ResNet-50)
|
Knowledge distillation: A good teacher is patient…
|
|
2021-06-09
|
|
CA-Swin-T (+MixPro)
|
MixPro: Data Augmentation with MaskMix and Progre…
|
|
2023-04-24
|
|
CaiT-S + GFSA
|
Graph Convolutions Enrich the Self-Attention in T…
|
|
2023-12-07
|
|
RDNet-T
|
DenseNets Reloaded: Paradigm Shift Beyond ResNets…
|
|
2024-03-28
|
|
MultiGrain SENet154 (500px)
|
MultiGrain: a unified image embedding for classes…
|
|
2019-02-14
|
|
VAN-B2
|
Visual Attention Network
|
|
2022-02-20
|
|
DaViT-T
|
DaViT: Dual Attention Vision Transformers
|
|
2022-04-07
|
|
ReXNet_3.0
|
Rethinking Channel Dimensions for Efficient Model…
|
|
2020-07-02
|
|
Sequencer2D-M
|
Sequencer: Deep LSTM for Image Classification
|
|
2022-05-04
|
|
CrossViT-18+
|
CrossViT: Cross-Attention Multi-Scale Vision Tran…
|
|
2021-03-27
|
|
Shift-S
|
When Shift Operation Meets Vision Transformer: An…
|
|
2022-01-26
|
|
HRFormer-B
|
HRFormer: High-Resolution Transformer for Dense P…
|
|
2021-10-18
|
|
BoTNet T4
|
Bottleneck Transformers for Visual Recognition
|
|
2021-01-27
|
|
PVT-M (+MixPro)
|
MixPro: Data Augmentation with MaskMix and Progre…
|
|
2023-04-24
|
|
EfficientViT-B2 (r256)
|
EfficientViT: Multi-Scale Linear Attention for Hi…
|
|
2022-05-29
|
|
DiNAT-Tiny
|
Dilated Neighborhood Attention Transformer
|
|
2022-09-29
|
|
ELSA-Swin-T
|
ELSA: Enhanced Local Self-Attention for Vision Tr…
|
|
2021-12-23
|
|
MambaVision-T2
|
MambaVision: A Hybrid Mamba-Transformer Vision Ba…
|
|
2024-07-10
|
|
NASNET-A(6)
|
Learning Transferable Architectures for Scalable …
|
|
2017-07-21
|
|
RVT-B*
|
Towards Robust Vision Transformer
|
|
2021-05-17
|
|
CMA(ViT-B/16)
|
Enhanced OoD Detection through Cross-Modal Alignm…
|
|
2025-03-24
|
|
FBNetV5-C-CLS
|
FBNetV5: Neural Architecture Search for Multiple …
|
|
2021-11-19
|
|
MultiGrain PNASNet (400px)
|
MultiGrain: a unified image embedding for classes…
|
|
2019-02-14
|
|
ViT-S-24x2
|
Three things everyone should know about Vision Tr…
|
|
2022-03-18
|
|
FastViT-SA24
|
FastViT: A Fast Hybrid Vision Transformer using S…
|
|
2023-03-24
|
|
FixEfficientNet-B1
|
Fixing the train-test resolution discrepancy: Fix…
|
|
2020-03-18
|
|
EfficientNet-B4
|
EfficientNet: Rethinking Model Scaling for Convol…
|
|
2019-05-28
|
|
MViTv2-T
|
MViTv2: Improved Multiscale Vision Transformers f…
|
|
2021-12-02
|
|
DeiT-B
|
Training data-efficient image transformers & dist…
|
|
2020-12-23
|
|
T2T-ViTt-24
|
Tokens-to-Token ViT: Training Vision Transformers…
|
|
2021-01-28
|
|
ViT-S
|
Which Transformer to Favor: A Comparative Analysi…
|
|
2023-08-18
|
|
CvT-21
|
CvT: Introducing Convolutions to Vision Transform…
|
|
2021-03-29
|
|
TransNeXt-Micro (IN-1K supervised, 224)
|
TransNeXt: Robust Foveal Visual Perception for Vi…
|
|
2023-11-28
|
|
FixResNet-50 Billion-scale@224
|
Fixing the train-test resolution discrepancy
|
|
2019-06-14
|
|
Next-ViT-S
|
Next-ViT: Next Generation Vision Transformer for …
|
|
2022-07-12
|
|
SCARLET-A4
|
SCARLET-NAS: Bridging the Gap between Stability a…
|
|
2019-08-16
|
|
Sequencer2D-S
|
Sequencer: Deep LSTM for Image Classification
|
|
2022-05-04
|
|
LeViT-384
|
LeViT: a Vision Transformer in ConvNet's Clothing…
|
|
2021-04-02
|
|
CrossViT-18
|
CrossViT: Cross-Attention Multi-Scale Vision Tran…
|
|
2021-03-27
|
|
MetaFormer PoolFormer-M48
|
MetaFormer Is Actually What You Need for Vision
|
|
2021-11-22
|
|
ConViT-B+
|
ConViT: Improving Vision Transformers with Soft C…
|
|
2021-03-19
|
|
TransBoost-ConvNext-T
|
TransBoost: Improving the Best ImageNet Performan…
|
|
2022-05-26
|
|
M2D-T
|
Mamba2D: A Natively Multi-Dimensional State-Space…
|
|
2024-12-20
|
|
NoisyStudent (EfficientNet-B2)
|
Self-training with Noisy Student improves ImageNe…
|
|
2019-11-11
|
|
ResNet-152 (A2 + reg)
|
ResNet strikes back: An improved training procedu…
|
|
2021-10-01
|
|
ConViT-B
|
ConViT: Improving Vision Transformers with Soft C…
|
|
2021-03-19
|
|
DeiT-B with iRPE-K
|
Rethinking and Improving Relative Position Encodi…
|
|
2021-07-29
|
|
Mega
|
Mega: Moving Average Equipped Gated Attention
|
|
2022-09-21
|
|
ViT-B/16-224+HTM
|
TokenMixup: Efficient Attention-guided Token-leve…
|
|
2022-10-14
|
|
ColorNet
|
ColorNet: Investigating the importance of color s…
|
|
2019-02-01
|
|
T2T-ViT-24
|
Tokens-to-Token ViT: Training Vision Transformers…
|
|
2021-01-28
|
|
ViT-S-48x1
|
Three things everyone should know about Vision Tr…
|
|
2022-03-18
|
|
Visformer-S
|
Visformer: The Vision-friendly Transformer
|
|
2021-04-26
|
|
MobileViTv3-S
|
MobileViTv3: Mobile-Friendly Vision Transformer w…
|
|
2022-09-30
|
|
CrossViT-15+
|
CrossViT: Cross-Attention Multi-Scale Vision Tran…
|
|
2021-03-27
|
|
MambaVision-T
|
MambaVision: A Hybrid Mamba-Transformer Vision Ba…
|
|
2024-07-10
|
|
GLiT-Bases
|
GLiT: Neural Architecture Search for Global and L…
|
|
2021-07-07
|
|
EViT (delete)
|
Which Transformer to Favor: A Comparative Analysi…
|
|
2023-08-18
|
|
STViT-Swin-Ti
|
Which Transformer to Favor: A Comparative Analysi…
|
|
2023-08-18
|
|
BossNet-T1
|
BossNAS: Exploring Hybrid CNN-transformers with B…
|
|
2021-03-23
|
|
CAIT-XXS-36
|
Going deeper with Image Transformers
|
|
2021-03-31
|
|
CvT-13-NAS
|
CvT: Introducing Convolutions to Vision Transform…
|
|
2021-03-29
|
|
ViTAE-S-Stage
|
ViTAE: Vision Transformer Advanced by Exploring I…
|
|
2021-06-07
|
|
T2T-ViTt-19
|
Tokens-to-Token ViT: Training Vision Transformers…
|
|
2021-01-28
|
|
Evo-LeViT-384*
|
Evo-ViT: Slow-Fast Token Evolution for Dynamic Vi…
|
|
2021-08-03
|
|
ConViT-S+
|
ConViT: Improving Vision Transformers with Soft C…
|
|
2021-03-19
|
|
DeepVit-L
|
DeepViT: Towards Deeper Vision Transformer
|
|
2021-03-22
|
|
SENet-152
|
Bottleneck Transformers for Visual Recognition
|
|
2021-01-27
|
|
ResNeXt-101 32x8d
|
Exploring the Limits of Weakly Supervised Pretrai…
|
|
2018-05-02
|
|
TransBoost-Swin-T
|
TransBoost: Improving the Best ImageNet Performan…
|
|
2022-05-26
|
|
ResNeXt-101, 64x4d, S=2(224px)
|
Towards Better Accuracy-efficiency Trade-offs: Di…
|
|
2020-11-30
|
|
ToMe-ViT-S
|
Which Transformer to Favor: A Comparative Analysi…
|
|
2023-08-18
|
|
AMD(ViT-S/16)
|
Asymmetric Masked Distillation for Pre-Training S…
|
|
2023-11-06
|
|
PatchConvNet-S60
|
Augmenting Convolutional networks with attention-…
|
|
2021-12-27
|
|
AlphaNet-A6
|
AlphaNet: Improved Training of Supernets with Alp…
|
|
2021-02-16
|
|
Pyramid ViG-S
|
Vision GNN: An Image is Worth Graph of Nodes
|
|
2022-06-01
|
|
ConvNeXt-T
|
A ConvNet for the 2020s
|
|
2022-01-10
|
|
FasterViT-0
|
FasterViT: Fast Vision Transformers with Hierarch…
|
|
2023-06-09
|
|
CeiT-S
|
Incorporating Convolution Designs into Visual Tra…
|
|
2021-03-22
|
|
NEXcepTion-S
|
From Xception to NEXcepTion: New Design Decisions…
|
|
2022-12-16
|
|
GC ViT-XT
|
Global Context Vision Transformers
|
|
2022-06-20
|
|
Container-Light
|
Container: Context Aggregation Network
|
|
2021-06-02
|
|
ViL-Small
|
Multi-Scale Vision Longformer: A New Vision Trans…
|
|
2021-03-29
|
|
PVTv2-B2
|
PVT v2: Improved Baselines with Pyramid Vision Tr…
|
|
2021-06-25
|
|
ActiveMLP-T
|
Active Token Mixer
|
|
2022-03-11
|
|
LITv2-S
|
Fast Vision Transformers with HiLo Attention
|
|
2022-05-26
|
|
DAT-T
|
Vision Transformer with Deformable Attention
|
|
2022-01-03
|
|
EViT (fuse)
|
Which Transformer to Favor: A Comparative Analysi…
|
|
2023-08-18
|
|
GTP-LV-ViT-S/P8
|
GTP-ViT: Efficient Vision Transformers via Graph-…
|
|
2023-11-06
|
|
Diffusion Classifier
|
Your Diffusion Model is Secretly a Zero-Shot Clas…
|
|
2023-03-28
|
|
T2T-ViT-19
|
Tokens-to-Token ViT: Training Vision Transformers…
|
|
2021-01-28
|
|
ResNet-101 (224 res, Fast Knowledge Distillation)
|
A Fast Knowledge Distillation Framework for Visua…
|
|
2021-12-02
|
|
Discrete Adversarial Distillation (ViT-B, 224)
|
Distilling Out-of-Distribution Robustness from Vi…
|
|
2023-11-02
|
|
DeBiFormer-T
|
DeBiFormer: Vision Transformer with Deformable Ag…
|
|
2024-10-11
|
|
RVT-S*
|
Towards Robust Vision Transformer
|
|
2021-05-17
|
|
PiT-S
|
Rethinking Spatial Dimensions of Vision Transform…
|
|
2021-03-30
|
|
sMLPNet-T (ImageNet-1k)
|
Sparse MLP for Image Recognition: Is Self-Attenti…
|
|
2021-09-12
|
|
ViL-Base-W
|
Multi-Scale Vision Longformer: A New Vision Trans…
|
|
2021-03-29
|
|
Swin-T+SSA
|
The Information Pathways Hypothesis: Transformers…
|
|
2023-06-02
|
|
AOGNet-40M-AN
|
Attentive Normalization
|
|
2019-08-04
|
|
FBNetV5
|
FBNetV5: Neural Architecture Search for Multiple …
|
|
2021-11-19
|
|
ResNet-200
|
Parametric Contrastive Learning
|
|
2021-07-26
|
|
RepMLPNet-L256
|
RepMLPNet: Hierarchical Vision MLP with Re-parame…
|
|
2021-12-21
|
|
NEXcepTion-TP
|
From Xception to NEXcepTion: New Design Decisions…
|
|
2022-12-16
|
|
NAT-Mini
|
Neighborhood Attention Transformer
|
|
2022-04-14
|
|
DiNAT-Mini
|
Dilated Neighborhood Attention Transformer
|
|
2022-09-29
|
|
ResNet-152 (A2)
|
ResNet strikes back: An improved training procedu…
|
|
2021-10-01
|
|
MEAL V2 (ResNet-50) (380 res)
|
MEAL V2: Boosting Vanilla ResNet-50 to 80%+ Top-1…
|
|
2020-09-17
|
|
gSwin-T
|
gSwin: Gated MLP Vision Model with Hierarchical S…
|
|
2022-08-24
|
|
FBNetV5-A-CLS
|
FBNetV5: Neural Architecture Search for Multiple …
|
|
2021-11-19
|
|
T2T-ViT-14
|
Beyond Self-attention: External Attention using T…
|
|
2021-05-05
|
|
QnA-ViT-Tiny
|
Learned Queries for Efficient Local Attention
|
|
2021-12-21
|
|
AutoFormer-small
|
AutoFormer: Searching Transformers for Visual Rec…
|
|
2021-07-01
|
|
Shift-T
|
When Shift Operation Meets Vision Transformer: An…
|
|
2022-01-26
|
|
BoTNet T3
|
Bottleneck Transformers for Visual Recognition
|
|
2021-01-27
|
|
FastViT-T12
|
FastViT: A Fast Hybrid Vision Transformer using S…
|
|
2023-03-24
|
|
CvT-13
|
CvT: Introducing Convolutions to Vision Transform…
|
|
2021-03-29
|
|
ResNet-152 (SAM)
|
Sharpness-Aware Minimization for Efficiently Impr…
|
|
2020-10-03
|
|
UniRepLKNet-N
|
UniRepLKNet: A Universal Perception Large-Kernel …
|
|
2023-11-27
|
|
CloFormer-S
|
Rethinking Local Perception in Lightweight Vision…
|
|
2023-03-31
|
|
LeViT-256
|
LeViT: a Vision Transformer in ConvNet's Clothing…
|
|
2021-04-02
|
|
ReXNet_2.0
|
Rethinking Channel Dimensions for Efficient Model…
|
|
2020-07-02
|
|
SE-CoTNetD-50
|
Contextual Transformer Networks for Visual Recogn…
|
|
2021-07-26
|
|
CoAtNet-0
|
CoAtNet: Marrying Convolution and Attention for A…
|
|
2021-06-09
|
|
gMLP-B
|
Pay Attention to MLPs
|
|
2021-05-17
|
|
CoE-Large + CondConv
|
Collaboration of Experts: Achieving 80% Top-1 Acc…
|
|
2021-07-08
|
|
GTP-DeiT-B/P8
|
GTP-ViT: Efficient Vision Transformers via Graph-…
|
|
2023-11-06
|
|
NEXcepTion-T
|
From Xception to NEXcepTion: New Design Decisions…
|
|
2022-12-16
|
|
DeiT-S-24 + GFSA
|
Graph Convolutions Enrich the Self-Attention in T…
|
|
2023-12-07
|
|
NoisyStudent (EfficientNet-B1)
|
Self-training with Noisy Student improves ImageNe…
|
|
2019-11-11
|
|
TinyViT-11M
|
TinyViT: Fast Pretraining Distillation for Small …
|
|
2022-07-21
|
|
Transformer local-attention (NesT-T)
|
Nested Hierarchical Transformer: Towards Accurate…
|
|
2021-05-26
|
|
ResNet-200 (Supervised Contrastive)
|
Supervised Contrastive Learning
|
|
2020-04-23
|
|
T2T-ViT-14
|
Tokens-to-Token ViT: Training Vision Transformers…
|
|
2021-01-28
|
|
CrossViT-15
|
CrossViT: Cross-Attention Multi-Scale Vision Tran…
|
|
2021-03-27
|
|
PyConvResNet-101
|
Pyramidal Convolution: Rethinking Convolutional N…
|
|
2020-06-20
|
|
ViT-B/16 (RPE w/ GAB)
|
Understanding Gaussian Attention Bias of Vision T…
|
|
2023-05-08
|
|
MobileOne-S4 (distill)
|
MobileOne: An Improved One millisecond Mobile Bac…
|
|
2022-06-08
|
|
DeiT-S with iRPE-QKV
|
Rethinking and Improving Relative Position Encodi…
|
|
2021-07-29
|
|
ViT-S @224 (DeiT III)
|
DeiT III: Revenge of the ViT
|
|
2022-04-14
|
|
BiFormer-T (IN1k ptretrain)
|
BiFormer: Vision Transformer with Bi-Level Routin…
|
|
2023-03-15
|
|
UniNet-B0
|
UniNet: Unified Architecture Search with Convolut…
|
|
2022-07-12
|
|
LocalViT-S
|
LocalViT: Bringing Locality to Vision Transformers
|
|
2021-04-12
|
|
SENet-101
|
Bottleneck Transformers for Visual Recognition
|
|
2021-01-27
|
|
GFNet-S
|
Which Transformer to Favor: A Comparative Analysi…
|
|
2023-08-18
|
|
ResNet-200 (Adversarial Autoaugment)
|
Adversarial AutoAugment
|
|
2019-12-24
|
|
MultiGrain PNASNet (300px)
|
MultiGrain: a unified image embedding for classes…
|
|
2019-02-14
|
|
ResNet-152
|
Parametric Contrastive Learning
|
|
2021-07-26
|
|
ConViT-S
|
ConViT: Improving Vision Transformers with Soft C…
|
|
2021-03-19
|
|
Swin-T
|
Swin Transformer: Hierarchical Vision Transformer…
|
|
2021-03-25
|
|
Res2Net-101
|
Res2Net: A New Multi-scale Backbone Architecture
|
|
2019-04-02
|
|
PVT-S (+MixPro)
|
MixPro: Data Augmentation with MaskMix and Progre…
|
|
2023-04-24
|
|
TransBoost-ResNet50-StrikesBack
|
TransBoost: Improving the Best ImageNet Performan…
|
|
2022-05-26
|
|
ResNeSt-50
|
ResNeSt: Split-Attention Networks
|
|
2020-04-19
|
|
DeiT-S with iRPE-QK
|
Rethinking and Improving Relative Position Encodi…
|
|
2021-07-29
|
|
DeiT-S-12 + GFSA
|
Graph Convolutions Enrich the Self-Attention in T…
|
|
2023-12-07
|
|
CAS-ViT-S
|
CAS-ViT: Convolutional Additive Self-attention Vi…
|
|
2024-08-07
|
|
EfficientNet-B3
|
EfficientNet: Rethinking Model Scaling for Convol…
|
|
2019-05-28
|
|
VAN-B1
|
Visual Attention Network
|
|
2022-02-20
|
|
RevBiFPN-S3
|
RevBiFPN: The Fully Reversible Bidirectional Feat…
|
|
2022-06-28
|
|
ResNet-152x2-SAM
|
When Vision Transformers Outperform ResNets witho…
|
|
2021-06-03
|
|
DynamicViT-S
|
Which Transformer to Favor: A Comparative Analysi…
|
|
2023-08-18
|
|
ResNet-101 (SAMix)
|
Boosting Discriminative Visual Representation Lea…
|
|
2021-11-30
|
|
ViTAE-13M
|
ViTAE: Vision Transformer Advanced by Exploring I…
|
|
2021-06-07
|
|
ResNet-101 (AutoMix)
|
AutoMix: Unveiling the Power of Mixup for Stronge…
|
|
2021-03-24
|
|
ResNet-101
|
Parametric Contrastive Learning
|
|
2021-07-26
|
|
CAIT-XXS-24
|
Going deeper with Image Transformers
|
|
2021-03-31
|
|
DeiT-S with iRPE-K
|
Rethinking and Improving Relative Position Encodi…
|
|
2021-07-29
|
|
CentroidViT-S (arXiv, 2021-02)
|
Centroid Transformers: Learning to Abstract with …
|
|
2021-02-17
|
|
ResMLP-S24
|
ResMLP: Feedforward networks for image classifica…
|
|
2021-05-07
|
|
MNv4-Hybrid-M
|
MobileNetV4 -- Universal Models for the Mobile Ec…
|
|
2024-04-16
|
|
TinyViT-5M-distill (21k)
|
TinyViT: Fast Pretraining Distillation for Small …
|
|
2022-07-21
|
|
CoE-Large
|
Collaboration of Experts: Achieving 80% Top-1 Acc…
|
|
2021-07-08
|
|
MEAL V2 (ResNet-50) (224 res)
|
MEAL V2: Boosting Vanilla ResNet-50 to 80%+ Top-1…
|
|
2020-09-17
|
|
TokenLearner-ViT-8
|
Which Transformer to Favor: A Comparative Analysi…
|
|
2023-08-18
|
|
ResNeSt-50-fast
|
ResNeSt: Split-Attention Networks
|
|
2020-04-19
|
|
TransBoost-ResNet152
|
TransBoost: Improving the Best ImageNet Performan…
|
|
2022-05-26
|
|
ResNet-200 (Fast AA)
|
Fast AutoAugment
|
|
2019-05-01
|
|
CaiT-XXS (+MixPro)
|
MixPro: Data Augmentation with MaskMix and Progre…
|
|
2023-04-24
|
|
FastViT-SA12
|
FastViT: A Fast Hybrid Vision Transformer using S…
|
|
2023-03-24
|
|
AlphaNet-A5
|
AlphaNet: Improved Training of Supernets with Alp…
|
|
2021-02-16
|
|
ResNet-50+AutoDropout+RandAugment
|
AutoDropout: Learning Dropout Patterns to Regular…
|
|
2021-01-05
|
|
FairNAS-B
|
FairNAS: Rethinking Evaluation Fairness of Weight…
|
|
2019-07-03
|
|
ResNeXt-101 (CutMix)
|
CutMix: Regularization Strategy to Train Strong C…
|
|
2019-05-13
|
|
Attention-92
|
Residual Attention Network for Image Classificati…
|
|
2017-04-23
|
|
NAT-M4
|
Neural Architecture Transfer
|
|
2020-05-12
|
|
IPT-T
|
IncepFormer: Efficient Inception Transformer with…
|
|
2022-12-06
|
|
GLiT-Smalls
|
GLiT: Neural Architecture Search for Global and L…
|
|
2021-07-07
|
|
HCGNet-C
|
Gated Convolutional Networks with Hybrid Connecti…
|
|
2019-08-26
|
|
DVT (T2T-ViT-12)
|
Not All Images are Worth 16x16 Words: Dynamic Tra…
|
|
2021-05-31
|
|
UniNet-B1
|
UniNet: Unified Architecture Search with Convolut…
|
|
2021-10-08
|
|
DeiT-S (T2)
|
ResNet strikes back: An improved training procedu…
|
|
2021-10-01
|
|
ResNet50 (A1)
|
ResNet strikes back: An improved training procedu…
|
|
2021-10-01
|
|
gSwin-VT
|
gSwin: Gated MLP Vision Model with Hierarchical S…
|
|
2022-08-24
|
|
ResNet-34 (X-volution, stage3)
|
X-volution: On the unification of convolution and…
|
|
2021-06-04
|
|
ReXNet_1.5
|
Rethinking Channel Dimensions for Efficient Model…
|
|
2020-07-02
|
|
iAFF-ResNeXt-50-32x4d
|
Attentional Feature Fusion
|
|
2020-09-29
|
|
UniRepLKNet-P
|
UniRepLKNet: A Universal Perception Large-Kernel …
|
|
2023-11-27
|
|
FixEfficientNet-B0
|
Fixing the train-test resolution discrepancy: Fix…
|
|
2020-03-18
|
|
ConvMLP-L
|
ConvMLP: Hierarchical Convolutional MLPs for Visi…
|
|
2021-09-09
|
|
ResNet-50 (224 res, Fast Knowledge Distillation)
|
A Fast Knowledge Distillation Framework for Visua…
|
|
2021-12-02
|
|
HVT Base
|
HVT: A Comprehensive Vision Framework for Learnin…
|
|
2024-09-25
|
|
Inception ResNet V2
|
Inception-v4, Inception-ResNet and the Impact of …
|
|
2016-02-23
|
|
RandWire-WS
|
Exploring Randomly Wired Neural Networks for Imag…
|
|
2019-04-02
|
|
WideNet-H
|
Go Wider Instead of Deeper
|
|
2021-07-25
|
|
CoE-Small + CondConv + PWLU
|
Collaboration of Experts: Achieving 80% Top-1 Acc…
|
|
2021-07-08
|
|
BasisNet-MV3
|
BasisNet: Two-stage Model Synthesis for Efficient…
|
|
2021-05-07
|
|
AlphaNet-A4
|
AlphaNet: Improved Training of Supernets with Alp…
|
|
2021-02-16
|
|
MogaNet-T (256res)
|
MogaNet: Multi-order Gated Aggregation Network
|
|
2022-11-07
|
|
LeViT-192
|
LeViT: a Vision Transformer in ConvNet's Clothing…
|
|
2021-04-02
|
|
CloFormer-XS
|
Rethinking Local Perception in Lightweight Vision…
|
|
2023-03-31
|
|
ResNet-101
|
Bottleneck Transformers for Visual Recognition
|
|
2021-01-27
|
|
ResNet-200
|
Identity Mappings in Deep Residual Networks
|
|
2016-03-16
|
|
MNv4-Conv-M
|
MobileNetV4 -- Universal Models for the Mobile Ec…
|
|
2024-04-16
|
|
RegNetY-8.0GF
|
Designing Network Design Spaces
|
|
2020-03-30
|
|
ViT-B/16-SAM
|
When Vision Transformers Outperform ResNets witho…
|
|
2021-06-03
|
|
TransBoost-ResNet101
|
TransBoost: Improving the Best ImageNet Performan…
|
|
2022-05-26
|
|
SKNet-101
|
Selective Kernel Networks
|
|
2019-03-15
|
|
FixResNet-50 CutMix
|
Fixing the train-test resolution discrepancy
|
|
2019-06-14
|
|
CSPResNeXt-50 + Mish
|
Mish: A Self Regularized Non-Monotonic Activation…
|
|
2019-08-23
|
|
kNN-CLIP
|
Revisiting a kNN-based Image Classification Syste…
|
|
2022-04-03
|
|
FastViT-S12
|
FastViT: A Fast Hybrid Vision Transformer using S…
|
|
2023-03-24
|
|
GC ViT-XXT
|
Global Context Vision Transformers
|
|
2022-06-20
|
|
CSPResNeXt-50 (Mish+Aug)
|
CSPNet: A New Backbone that can Enhance Learning …
|
|
2019-11-27
|
|
DVT (T2T-ViT-10)
|
Not All Images are Worth 16x16 Words: Dynamic Tra…
|
|
2021-05-31
|
|
GPaCo (ResNet-50)
|
Generalized Parametric Contrastive Learning
|
|
2022-09-26
|
|
ResMLP-36
|
ResMLP: Feedforward networks for image classifica…
|
|
2021-05-07
|
|
Grafit (ResNet-50)
|
Grafit: Learning fine-grained image representatio…
|
|
2020-11-25
|
|
LeViT-128
|
LeViT: a Vision Transformer in ConvNet's Clothing…
|
|
2021-04-02
|
|
ResT-Small
|
ResT: An Efficient Transformer for Visual Recogni…
|
|
2021-05-28
|
|
GTP-DeiT-S/P8
|
GTP-ViT: Efficient Vision Transformers via Graph-…
|
|
2023-11-06
|
|
ReXNet_1.3
|
Rethinking Channel Dimensions for Efficient Model…
|
|
2020-07-02
|
|
WideNet-L
|
Go Wider Instead of Deeper
|
|
2021-07-25
|
|
ResNet-50 (SAMix)
|
Boosting Discriminative Visual Representation Lea…
|
|
2021-11-30
|
|
AlphaNet-A3
|
AlphaNet: Improved Training of Supernets with Alp…
|
|
2021-02-16
|
|
ResMLP-24
|
ResMLP: Feedforward networks for image classifica…
|
|
2021-05-07
|
|
EdgeNeXt-S
|
EdgeNeXt: Efficiently Amalgamated CNN-Transformer…
|
|
2022-06-21
|
|
TinyNet (GhostNet-A)
|
Model Rubik's Cube: Twisting Resolution, Depth an…
|
|
2020-10-28
|
|
MobileOne-S4
|
MobileOne: An Improved One millisecond Mobile Bac…
|
|
2022-06-08
|
|
RegNetY-4.0GF
|
Designing Network Design Spaces
|
|
2020-03-30
|
|
SENet-50
|
Bottleneck Transformers for Visual Recognition
|
|
2021-01-27
|
|
ScaleNet-152
|
Data-Driven Neuron Allocation for Scale Aggregati…
|
|
2019-04-20
|
|
LIP-ResNet-101
|
LIP: Local Importance-based Pooling
|
|
2019-08-12
|
|
RedNet-152
|
Involution: Inverting the Inherence of Convolutio…
|
|
2021-03-10
|
|
ResNet-50 (AutoMix)
|
AutoMix: Unveiling the Power of Mixup for Stronge…
|
|
2021-03-24
|
|
PS-KD (ResNet-152 + CutMix)
|
Self-Knowledge Distillation with Progressive Refi…
|
|
2020-06-22
|
|
ResNet-101 (JFT-300M Finetuning)
|
Revisiting Unreasonable Effectiveness of Data in …
|
|
2017-07-10
|
|
RVT-Ti*
|
Towards Robust Vision Transformer
|
|
2021-05-17
|
|
Multiscale DEQ (MDEQ-XL)
|
Multiscale Deep Equilibrium Models
|
|
2020-06-15
|
|
DenseNet-169 (H4*)
|
How to Use Dropout Correctly on Residual Networks…
|
|
2023-02-13
|
|
AlphaNet-A2
|
AlphaNet: Improved Training of Supernets with Alp…
|
|
2021-02-16
|
|
AA-ResNet-152
|
Attention Augmented Convolutional Networks
|
|
2019-04-22
|
|
FixResNet-50
|
Fixing the train-test resolution discrepancy
|
|
2019-06-14
|
|
MobileOne-S2 (distill)
|
MobileOne: An Improved One millisecond Mobile Bac…
|
|
2022-06-08
|
|
PiT-XS
|
Rethinking Spatial Dimensions of Vision Transform…
|
|
2021-03-30
|
|
UniNet-B0
|
UniNet: Unified Architecture Search with Convolut…
|
|
2021-10-08
|
|
RedNet-101
|
Involution: Inverting the Inherence of Convolutio…
|
|
2021-03-10
|
|
ResNet-50 (UDA)
|
Unsupervised Data Augmentation for Consistency Tr…
|
|
2019-04-29
|
|
ScaleNet-101
|
Data-Driven Neuron Allocation for Scale Aggregati…
|
|
2019-04-20
|
|
TransBoost-ResNet50
|
TransBoost: Improving the Best ImageNet Performan…
|
|
2022-05-26
|
|
Co-ResNet-152
|
Contextual Convolutional Neural Networks
|
|
2021-08-17
|
|
MobileNetV3_large_x1_0_ssld
|
Semi-Supervised Recognition under a Noisy and Fin…
|
|
2020-06-18
|
|
RevBiFPN-S2
|
RevBiFPN: The Fully Reversible Bidirectional Feat…
|
|
2022-06-28
|
|
ConvMLP-M
|
ConvMLP: Hierarchical Convolutional MLPs for Visi…
|
|
2021-09-09
|
|
Xception
|
Xception: Deep Learning with Depthwise Separable …
|
|
2016-10-07
|
|
MixNet-L
|
MixConv: Mixed Depthwise Convolutional Kernels
|
|
2019-07-22
|
|
SpineNet-143
|
SpineNet: Learning Scale-Permuted Backbone for Re…
|
|
2019-12-10
|
|
Mixer-B/8-SAM
|
When Vision Transformers Outperform ResNets witho…
|
|
2021-06-03
|
|
InceptionV3 (FRN layer)
|
Filter Response Normalization Layer: Eliminating …
|
|
2019-11-21
|
|
ResNet-152 + SWA
|
Averaging Weights Leads to Wider Optima and Bette…
|
|
2018-03-14
|
|
ECA-Net (ResNet-152)
|
ECA-Net: Efficient Channel Attention for Deep Con…
|
|
2019-10-08
|
|
AlphaNet-A1
|
AlphaNet: Improved Training of Supernets with Alp…
|
|
2021-02-16
|
|
CeiT-T (384 finetune res)
|
Incorporating Convolution Designs into Visual Tra…
|
|
2021-03-22
|
|
CeiT-T
|
Incorporating Convolution Designs into Visual Tra…
|
|
2021-03-22
|
|
NoisyStudent (EfficientNet-B0)
|
Self-training with Noisy Student improves ImageNe…
|
|
2019-11-11
|
|
EfficientNet-B1
|
EfficientNet: Rethinking Model Scaling for Convol…
|
|
2019-05-28
|
|
ResNet-50
|
Bottleneck Transformers for Visual Recognition
|
|
2021-01-27
|
|
SGE-ResNet101
|
Spatial Group-wise Enhance: Improving Semantic Fe…
|
|
2019-05-23
|
|
RepVGG-B2
|
RepVGG: Making VGG-style ConvNets Great Again
|
|
2021-01-11
|
|
ResNet-50
|
Puzzle Mix: Exploiting Saliency and Local Statist…
|
|
2020-09-15
|
|
ResNet-50
|
AutoDropout: Learning Dropout Patterns to Regular…
|
|
2021-01-05
|
|
CAS-ViT-XS
|
CAS-ViT: Convolutional Additive Self-attention Vi…
|
|
2024-08-07
|
|
SReT-LT (Fast Knowledge Distillation)
|
A Fast Knowledge Distillation Framework for Visua…
|
|
2021-12-02
|
|
PVTv2-B1
|
PVT v2: Improved Baselines with Pyramid Vision Tr…
|
|
2021-06-25
|
|
ResNet-50-DW (Deformable Kernels)
|
Deformable Kernels: Adapting Effective Receptive …
|
|
2019-10-07
|
|
ECA-Net (ResNet-101)
|
ECA-Net: Efficient Channel Attention for Deep Con…
|
|
2019-10-08
|
|
MobileViTv3-1.0
|
MobileViTv3: Mobile-Friendly Vision Transformer w…
|
|
2022-09-30
|
|
EdgeFormer-S
|
ParC-Net: Position Aware Circular Convolution wit…
|
|
2022-03-08
|
|
UniRepLKNet-F
|
UniRepLKNet: A Universal Perception Large-Kernel …
|
|
2023-11-27
|
|
TransBoost-EfficientNetB0
|
TransBoost: Improving the Best ImageNet Performan…
|
|
2022-05-26
|
|
Visformer-Ti
|
Visformer: The Vision-friendly Transformer
|
|
2021-04-26
|
|
ResMLP-12 (distilled, class-MLP)
|
ResMLP: Feedforward networks for image classifica…
|
|
2021-05-07
|
|
RepMLP-Res50
|
RepMLP: Re-parameterizing Convolutions into Fully…
|
|
2021-05-05
|
|
Res2Net-50-299
|
Res2Net: A New Multi-scale Backbone Architecture
|
|
2019-04-02
|
|
ResNet-152
|
Deep Residual Learning for Image Recognition
|
|
2015-12-10
|
|
HRFormer-T
|
HRFormer: High-Resolution Transformer for Dense P…
|
|
2021-10-18
|
|
RepVGG-B2g4
|
RepVGG: Making VGG-style ConvNets Great Again
|
|
2021-01-11
|
|
DVT (T2T-ViT-7)
|
Not All Images are Worth 16x16 Words: Dynamic Tra…
|
|
2021-05-31
|
|
SRM-ResNet-101
|
SRM : A Style-based Recalibration Module for Conv…
|
|
2019-03-26
|
|
DenseNet-161 + SWA
|
Averaging Weights Leads to Wider Optima and Bette…
|
|
2018-03-14
|
|
CoaT-Ti
|
Which Transformer to Favor: A Comparative Analysi…
|
|
2023-08-18
|
|
FBNetV5-AC-CLS
|
FBNetV5: Neural Architecture Search for Multiple …
|
|
2021-11-19
|
|
ResNet-50 (CutMix)
|
CutMix: Regularization Strategy to Train Strong C…
|
|
2019-05-13
|
|
ReXNet_1.0-relabel
|
Re-labeling ImageNet: from Single to Multi-Labels…
|
|
2021-01-13
|
|
MobileViT-S
|
MobileViT: Light-weight, General-purpose, and Mob…
|
|
2021-10-05
|
|
ResNet-34 (SAMix)
|
Boosting Discriminative Visual Representation Lea…
|
|
2021-11-30
|
|
EfficientNet-B0
|
EfficientNet: Rethinking Model Scaling for Convol…
|
|
2019-05-28
|
|
RedNet-50
|
Involution: Inverting the Inherence of Convolutio…
|
|
2021-03-10
|
|
ResNet-50 + DropBlock (0.9 kp, 0.1 label smoothing)
|
DropBlock: A regularization method for convolutio…
|
|
2018-10-30
|
|
Poly-SA-ViT-S
|
Which Transformer to Favor: A Comparative Analysi…
|
|
2023-08-18
|
|
EfficientNet-B0 (CondConv)
|
CondConv: Conditionally Parameterized Convolution…
|
|
2019-04-10
|
|
ResNet-101
|
Deep Residual Learning for Image Recognition
|
|
2015-12-10
|
|
MultiGrain R50-AA-224
|
MultiGrain: a unified image embedding for classes…
|
|
2019-02-14
|
|
Pyramid ViG-Ti
|
Vision GNN: An Image is Worth Graph of Nodes
|
|
2022-06-01
|
|
LocalViT-PVT
|
LocalViT: Bringing Locality to Vision Transformers
|
|
2021-04-12
|
|
ResNet-50 (LIP Bottleneck-256)
|
LIP: Local Importance-based Pooling
|
|
2019-08-12
|
|
WRN-50-2-bottleneck
|
Wide Residual Networks
|
|
2016-05-23
|
|
MobileViTv2-1.0
|
Separable Self-attention for Mobile Vision Transf…
|
|
2022-06-06
|
|
MobileOne-S3
|
MobileOne: An Improved One millisecond Mobile Bac…
|
|
2022-06-08
|
|
ResNet50 (A3)
|
ResNet strikes back: An improved training procedu…
|
|
2021-10-01
|
|
ZenNet-400M-SE
|
Zen-NAS: A Zero-Shot NAS for High-Performance Dee…
|
|
2021-02-01
|
|
RegNetY-1.6GF
|
Designing Network Design Spaces
|
|
2020-03-30
|
|
HVT-S-1
|
Scalable Vision Transformers with Hierarchical Po…
|
|
2021-03-19
|
|
Perceiver (FF)
|
Perceiver: General Perception with Iterative Atte…
|
|
2021-03-04
|
|
ReXNet_1.0
|
Rethinking Channel Dimensions for Efficient Model…
|
|
2020-07-02
|
|
ViTAE-6M
|
ViTAE: Vision Transformer Advanced by Exploring I…
|
|
2021-06-07
|
|
DenseNet-264
|
Densely Connected Convolutional Networks
|
|
2016-08-25
|
|
ResMLP-S12
|
ResMLP: Feedforward networks for image classifica…
|
|
2021-05-07
|
|
TinyNet-A + RA
|
Model Rubik's Cube: Twisting Resolution, Depth an…
|
|
2020-10-28
|
|
ResNet-50 (Fast AA)
|
Fast AutoAugment
|
|
2019-05-01
|
|
SReT-T
|
Sliced Recursive Transformer
|
|
2021-11-09
|
|
RedNet-38
|
Involution: Inverting the Inherence of Convolutio…
|
|
2021-03-10
|
|
SGE-ResNet50
|
Spatial Group-wise Enhance: Improving Semantic Fe…
|
|
2019-05-23
|
|
WideNet-B
|
Go Wider Instead of Deeper
|
|
2021-07-25
|
|
EfficientNet-B0
|
AutoDropout: Learning Dropout Patterns to Regular…
|
|
2021-01-05
|
|
ACNet (ResNet-50)
|
Adaptively Connected Neural Networks
|
|
2019-04-07
|
|
RegNetY-800MF
|
Designing Network Design Spaces
|
|
2020-03-30
|
|
ECA-Net (ResNet-50)
|
ECA-Net: Efficient Channel Attention for Deep Con…
|
|
2019-10-08
|
|
DenseNet-201
|
Densely Connected Convolutional Networks
|
|
2016-08-25
|
|
MobileOne-S2
|
MobileOne: An Improved One millisecond Mobile Bac…
|
|
2022-06-08
|
|
R-Mix (ResNet-50)
|
Expeditious Saliency-guided Mix-up through Random…
|
|
2022-12-09
|
|
ResnetV2 50 (FRN layer)
|
Filter Response Normalization Layer: Eliminating …
|
|
2019-11-21
|
|
FBNetV5-AR-CLS
|
FBNetV5: Neural Architecture Search for Multiple …
|
|
2021-11-19
|
|
MogaNet-XT (256res)
|
MogaNet: Multi-order Gated Aggregation Network
|
|
2022-11-07
|
|
ReXNet_0.9
|
Rethinking Channel Dimensions for Efficient Model…
|
|
2020-07-02
|
|
Prodpoly
|
Deep Polynomial Neural Networks
|
|
2020-06-20
|
|
SCARLET-B
|
SCARLET-NAS: Bridging the Gap between Stability a…
|
|
2019-08-16
|
|
GLiT-Tinys
|
GLiT: Neural Architecture Search for Global and L…
|
|
2021-07-07
|
|
DenseNet-169
|
Densely Connected Convolutional Networks
|
|
2016-08-25
|
|
GreedyNAS-C
|
GreedyNAS: Towards Fast One-Shot NAS with Greedy …
|
|
2020-03-25
|
|
ResNet-50-D
|
Bag of Tricks for Image Classification with Convo…
|
|
2018-12-04
|
|
Inception v3
|
What do Deep Networks Like to See?
|
|
2018-03-22
|
|
MKD ViT-T
|
Meta Knowledge Distillation
|
|
2022-02-16
|
|
GreedyNAS-A
|
GreedyNAS: Towards Fast One-Shot NAS with Greedy …
|
|
2020-03-25
|
|
SkipblockNet-L
|
Bias Loss for Mobile Neural Networks
|
|
2021-07-23
|
|
CI2P-ViT
|
Compress image to patches for Vision Transformer
|
|
2025-02-14
|
|
SSAL-Resnet50
|
Contextual Classification Using Self-Supervised A…
|
|
2021-01-07
|
|
UniRepLKNet-A
|
UniRepLKNet: A Universal Perception Large-Kernel …
|
|
2023-11-27
|
|
CloFormer-XXS
|
Rethinking Local Perception in Lightweight Vision…
|
|
2023-03-31
|
|
MixNet-M
|
MixConv: Mixed Depthwise Convolutional Kernels
|
|
2019-07-22
|
|
ResNet50 (FSGDM)
|
On the Performance Analysis of Momentum Method: A…
|
|
2024-11-29
|
|
SCARLET-A
|
SCARLET-NAS: Bridging the Gap between Stability a…
|
|
2019-08-16
|
|
TransBoost-MobileNetV3-L
|
TransBoost: Improving the Best ImageNet Performan…
|
|
2022-05-26
|
|
ViTAE-T-Stage
|
ViTAE: Vision Transformer Advanced by Exploring I…
|
|
2021-06-07
|
|
GreedyNAS-B
|
GreedyNAS: Towards Fast One-Shot NAS with Greedy …
|
|
2020-03-25
|
|
Perona Malik (Perona and Malik, 1990)
|
Learning Visual Representations for Transfer Lear…
|
|
2020-11-03
|
|
PVT-T (+MixPro)
|
MixPro: Data Augmentation with MaskMix and Progre…
|
|
2023-04-24
|
|
MobileViTv3-XS
|
MobileViTv3: Mobile-Friendly Vision Transformer w…
|
|
2022-09-30
|
|
MnasNet-A3
|
MnasNet: Platform-Aware Neural Architecture Searc…
|
|
2018-07-31
|
|
ViL-Tiny-RPB
|
Multi-Scale Vision Longformer: A New Vision Trans…
|
|
2021-03-29
|
|
ConViT-Ti+
|
ConViT: Improving Vision Transformers with Soft C…
|
|
2021-03-19
|
|
TransBoost-ResNet34
|
TransBoost: Improving the Best ImageNet Performan…
|
|
2022-05-26
|
|
LIP-DenseNet-BC-121
|
LIP: Local Importance-based Pooling
|
|
2019-08-12
|
|
ResNet-50 (X-volution, stage3)
|
X-volution: On the unification of convolution and…
|
|
2021-06-04
|
|
MUXNet-l
|
MUXConv: Information Multiplexing in Convolutiona…
|
|
2020-03-31
|
|
DeiT-B
|
Training data-efficient image transformers & dist…
|
|
2020-12-23
|
|
MobileViTv3-0.75
|
MobileViTv3: Mobile-Friendly Vision Transformer w…
|
|
2022-09-30
|
|
Mixer-B/16
|
MLP-Mixer: An all-MLP Architecture for Vision
|
|
2021-05-04
|
|
Perceiver
|
Perceiver: General Perception with Iterative Atte…
|
|
2021-03-04
|
|
SkipblockNet-M
|
Bias Loss for Mobile Neural Networks
|
|
2021-07-23
|
|
ResNet-34 (AutoMix)
|
AutoMix: Unveiling the Power of Mixup for Stronge…
|
|
2021-03-24
|
|
ResNet-50 MLPerf v0.7 - 2512 steps
|
A Large Batch Optimizer Reality Check: Traditiona…
|
|
2021-02-12
|
|
DenseNAS-A
|
Densely Connected Search Space for More Flexible …
|
|
2019-06-23
|
|
MobileOne-S1
|
MobileOne: An Improved One millisecond Mobile Bac…
|
|
2022-06-08
|
|
MoGA-A
|
MoGA: Searching Beyond MobileNetV3
|
|
2019-08-04
|
|
RevBiFPN-S1
|
RevBiFPN: The Fully Reversible Bidirectional Feat…
|
|
2022-06-28
|
|
LocalViT-TNT
|
LocalViT: Bringing Locality to Vision Transformers
|
|
2021-04-12
|
|
SALG-ST
|
Semantic-Aware Local-Global Vision Transformer
|
|
2022-11-27
|
|
RedNet-26
|
Involution: Inverting the Inherence of Convolutio…
|
|
2021-03-10
|
|
FractalNet-34
|
FractalNet: Ultra-Deep Neural Networks without Re…
|
|
2016-05-24
|
|
MixNet-S
|
MixConv: Mixed Depthwise Convolutional Kernels
|
|
2019-07-22
|
|
CoordConv ResNet-50
|
An Intriguing Failing of Convolutional Neural Net…
|
|
2018-07-09
|
|
LeViT-128S
|
LeViT: a Vision Transformer in ConvNet's Clothing…
|
|
2021-04-02
|
|
GhostNet ×1.3
|
GhostNet: More Features from Cheap Operations
|
|
2019-11-27
|
|
LR-Net-26
|
Local Relation Networks for Image Recognition
|
|
2019-04-25
|
|
FastViT-T8
|
FastViT: A Fast Hybrid Vision Transformer using S…
|
|
2023-03-24
|
|
MobileViTv2-0.75
|
Separable Self-attention for Mobile Vision Transf…
|
|
2022-06-06
|
|
MnasNet-A2
|
MnasNet: Platform-Aware Neural Architecture Searc…
|
|
2018-07-31
|
|
PAWS (ResNet-50, 10% labels)
|
Semi-Supervised Learning of Visual Features by No…
|
|
2021-04-28
|
|
RegNetY-600MF
|
Designing Network Design Spaces
|
|
2020-03-30
|
|
ShuffleNet V2
|
ShuffleNet V2: Practical Guidelines for Efficient…
|
|
2018-07-30
|
|
VAN-B0
|
Visual Attention Network
|
|
2022-02-20
|
|
AsymmNet-Large ×1.0
|
AsymmNet: Towards ultralight convolution neural n…
|
|
2021-04-15
|
|
FairNAS-A
|
FairNAS: Rethinking Evaluation Fairness of Weight…
|
|
2019-07-03
|
|
ViTAE-T
|
ViTAE: Vision Transformer Advanced by Exploring I…
|
|
2021-06-07
|
|
MUXNet-m
|
MUXConv: Information Multiplexing in Convolutiona…
|
|
2020-03-31
|
|
ResNet-50
|
Deep Residual Learning for Image Recognition
|
|
2015-12-10
|
|
MnasNet-A1
|
MnasNet: Platform-Aware Neural Architecture Searc…
|
|
2018-07-31
|
|
MobileNet V3-Large 1.0
|
Searching for MobileNetV3
|
|
2019-05-06
|
|
DiCENet
|
DiCENet: Dimension-wise Convolutions for Efficien…
|
|
2019-06-08
|
|
MultiGrain NASNet-A-Mobile (350px)
|
MultiGrain: a unified image embedding for classes…
|
|
2019-02-14
|
|
Ghost-ResNet-50 (s=2)
|
GhostNet: More Features from Cheap Operations
|
|
2019-11-27
|
|
DenseNet-121
|
Densely Connected Convolutional Networks
|
|
2016-08-25
|
|
Single-Path NAS
|
Single-Path NAS: Designing Hardware-Efficient Con…
|
|
2019-04-05
|
|
WaveMix-192/16 (level 3)
|
WaveMix: A Resource-efficient Neural Network for …
|
|
2022-05-28
|
|
FBNet-C
|
FBNet: Hardware-Aware Efficient ConvNet Design vi…
|
|
2018-12-09
|
|
ESPNetv2
|
ESPNetv2: A Light-weight, Power Efficient, and Ge…
|
|
2018-11-28
|
|
MobileViT-XS
|
MobileViT: Light-weight, General-purpose, and Mob…
|
|
2021-10-05
|
|
LocalViT-T
|
LocalViT: Bringing Locality to Vision Transformers
|
|
2021-04-12
|
|
RandWire-WS (small)
|
Exploring Randomly Wired Neural Networks for Imag…
|
|
2019-04-02
|
|
AutoFormer-tiny
|
AutoFormer: Searching Transformers for Visual Rec…
|
|
2021-07-01
|
|
MobileNetV2 (1.4)
|
MobileNetV2: Inverted Residuals and Linear Bottle…
|
|
2018-01-13
|
|
FairNAS-C
|
FairNAS: Rethinking Evaluation Fairness of Weight…
|
|
2019-07-03
|
|
ReXNet_0.6
|
Rethinking Channel Dimensions for Efficient Model…
|
|
2020-07-02
|
|
Proxyless
|
ProxylessNAS: Direct Neural Architecture Search o…
|
|
2018-12-02
|
|
PiT-Ti
|
Rethinking Spatial Dimensions of Vision Transform…
|
|
2021-03-30
|
|
DY-MobileNetV2 ×1.0
|
Dynamic Convolution: Attention over Convolution K…
|
|
2019-12-07
|
|
RegNetY-400MF
|
Designing Network Design Spaces
|
|
2020-03-30
|
|
PDC
|
Augmenting Deep Classifiers with Polynomial Neura…
|
|
2021-04-16
|
|
Ghost-ResNet-50 (s=4)
|
GhostNet: More Features from Cheap Operations
|
|
2019-11-27
|
|
SReT-ExT
|
Sliced Recursive Transformer
|
|
2021-11-09
|
|
GhostNet ×1.0
|
GhostNet: More Features from Cheap Operations
|
|
2019-11-27
|
|
DeiT-T (+MixPro)
|
MixPro: Data Augmentation with MaskMix and Progre…
|
|
2023-04-24
|
|
MNv4-Conv-S
|
MobileNetV4 -- Universal Models for the Mobile Ec…
|
|
2024-04-16
|
|
DeiT-Ti with iRPE-K
|
Rethinking and Improving Relative Position Encodi…
|
|
2021-07-29
|
|
TransBoost-ResNet18
|
TransBoost: Improving the Best ImageNet Performan…
|
|
2022-05-26
|
|
Wide ResNet-50 (edge-popup)
|
What's Hidden in a Randomly Weighted Neural Netwo…
|
|
2019-11-29
|
|
ResNet-18 (MEAL V2)
|
MEAL V2: Boosting Vanilla ResNet-50 to 80%+ Top-1…
|
|
2020-09-17
|
|
ConViT-Ti
|
ConViT: Improving Vision Transformers with Soft C…
|
|
2021-03-19
|
|
RevBiFPN-S0
|
RevBiFPN: The Fully Reversible Bidirectional Feat…
|
|
2022-06-28
|
|
DY-MobileNetV2 ×0.75
|
Dynamic Convolution: Attention over Convolution K…
|
|
2019-12-07
|
|
ResNet-18 (FT w/ ResNet-34 teacher)
|
torchdistill: A Modular, Configuration-Driven Fra…
|
|
2020-11-25
|
|
EfficientFormer-V2-S0
|
Which Transformer to Favor: A Comparative Analysi…
|
|
2023-08-18
|
|
DY-ResNet-18
|
Dynamic Convolution: Attention over Convolution K…
|
|
2019-12-07
|
|
ECA-Net (MobileNetV2)
|
ECA-Net: Efficient Channel Attention for Deep Con…
|
|
2019-10-08
|
|
MobileNet-224 (CGD)
|
Compact Global Descriptor for Neural Networks
|
|
2019-07-23
|
|
MobileOne-S0 (distill)
|
MobileOne: An Improved One millisecond Mobile Bac…
|
|
2022-06-08
|
|
LocalViT-T2T
|
LocalViT: Bringing Locality to Vision Transformers
|
|
2021-04-12
|
|
MobileViTv3-0.5
|
MobileViTv3: Mobile-Friendly Vision Transformer w…
|
|
2022-09-30
|
|
ResNet-18 (SAMix)
|
Boosting Discriminative Visual Representation Lea…
|
|
2021-11-30
|
|
ResNet-50
|
On the adequacy of untuned warmup for adaptive op…
|
|
2019-10-09
|
|
ResNet-18 (AutoMix)
|
AutoMix: Unveiling the Power of Mixup for Stronge…
|
|
2021-03-24
|
|
MobileNetV2
|
MobileNetV2: Inverted Residuals and Linear Bottle…
|
|
2018-01-13
|
|
Ours
|
QuantNet: Learning to Quantize by Learning within…
|
|
2020-09-10
|
|
ResNet-18 (PAD-L2 w/ ResNet-34 teacher)
|
torchdistill: A Modular, Configuration-Driven Fra…
|
|
2020-11-25
|
|
MUXNet-s
|
MUXConv: Information Multiplexing in Convolutiona…
|
|
2020-03-31
|
|
MobileOne-S0
|
MobileOne: An Improved One millisecond Mobile Bac…
|
|
2022-06-08
|
|
ResNet-18 (KD w/ ResNet-34 teacher)
|
torchdistill: A Modular, Configuration-Driven Fra…
|
|
2020-11-25
|
|
EdgeNeXt-XXS
|
EdgeNeXt: Efficiently Amalgamated CNN-Transformer…
|
|
2022-06-21
|
|
ResNet-18 (L2 w/ ResNet-34 teacher)
|
torchdistill: A Modular, Configuration-Driven Fra…
|
|
2020-11-25
|
|
MobileViTv3-XXS
|
MobileViTv3: Mobile-Friendly Vision Transformer w…
|
|
2022-09-30
|
|
ResNet-18 (CRD w/ ResNet-34 teacher)
|
torchdistill: A Modular, Configuration-Driven Fra…
|
|
2020-11-25
|
|
ShuffleNet
|
ShuffleNet: An Extremely Efficient Convolutional …
|
|
2017-07-04
|
|
MobileNet-224 ×1.25
|
MobileNets: Efficient Convolutional Neural Networ…
|
|
2017-04-17
|
|
ResNet-18 (tf-KD w/ ResNet-18 teacher)
|
torchdistill: A Modular, Configuration-Driven Fra…
|
|
2020-11-25
|
|
PVTv2-B0
|
PVT v2: Improved Baselines with Pyramid Vision Tr…
|
|
2021-06-25
|
|
MobileViTv2-0.5
|
Separable Self-attention for Mobile Vision Transf…
|
|
2022-06-06
|
|
ResNet-18 (SSKD w/ ResNet-34 teacher)
|
torchdistill: A Modular, Configuration-Driven Fra…
|
|
2020-11-25
|
|
DY-MobileNetV3-Small
|
Dynamic Convolution: Attention over Convolution K…
|
|
2019-12-07
|
|
HVT-Ti-1
|
Scalable Vision Transformers with Hierarchical Po…
|
|
2021-03-19
|
|
DY-MobileNetV2 ×0.5
|
Dynamic Convolution: Attention over Convolution K…
|
|
2019-12-07
|
|
AsymmNet-Large ×0.5
|
AsymmNet: Towards ultralight convolution neural n…
|
|
2021-04-15
|
|
Heteroscedastic (InceptionResNet-v2)
|
Correlated Input-Dependent Label Noise in Large-S…
|
|
2021-05-19
|
|
AsymmNet-Small ×1.0
|
AsymmNet: Towards ultralight convolution neural n…
|
|
2021-04-15
|
|
FireCaffe (GoogLeNet)
|
FireCaffe: near-linear acceleration of deep neura…
|
|
2015-10-31
|
|
Graph-RISE (40M)
|
Graph-RISE: Graph-Regularized Image Semantic Embe…
|
|
2019-02-14
|
|
ReActNet-A (BN-Free)
|
"BNN - BN = ?": Training Binary Neural Networks w…
|
|
2021-04-16
|
|
ResNet34 (FSGDM)
|
On the Performance Analysis of Momentum Method: A…
|
|
2024-11-29
|
|
DY-ResNet-10
|
Dynamic Convolution: Attention over Convolution K…
|
|
2019-12-07
|
|
MUXNet-xs
|
MUXConv: Information Multiplexing in Convolutiona…
|
|
2020-03-31
|
|
PAWS (ResNet-50, 1% labels)
|
Semi-Supervised Learning of Visual Features by No…
|
|
2021-04-28
|
|
GhostNet ×0.5
|
GhostNet: More Features from Cheap Operations
|
|
2019-11-27
|
|
OTTT
|
Online Training Through Time for Spiking Neural N…
|
|
2022-10-09
|
|
DY-MobileNetV2 ×0.35
|
Dynamic Convolution: Attention over Convolution K…
|
|
2019-12-07
|
|
BBG (ResNet-34)
|
Balanced Binary Neural Networks with Gated Residu…
|
|
2019-09-26
|
|
BBG (ResNet-18)
|
Balanced Binary Neural Networks with Gated Residu…
|
|
2019-09-26
|
|
FireCaffe (AlexNet)
|
FireCaffe: near-linear acceleration of deep neura…
|
|
2015-10-31
|
|
HMAX
|
0/1 Deep Neural Networks via Block Coordinate Des…
|
|
2022-06-19
|
|
ViT-Large
|
An Image is Worth 16x16 Words: Transformers for I…
|
|
2020-10-22
|
|
CCT-14/7x2
|
Escaping the Big Data Paradigm with Compact Trans…
|
|
2021-04-12
|
|
MambaVision-L2
|
MambaVision: A Hybrid Mamba-Transformer Vision Ba…
|
|
2024-07-10
|
|
ONE-PEACE
|
ONE-PEACE: Exploring One General Representation M…
|
|
2023-05-18
|
|
AIMv2-2B
|
Multimodal Autoregressive Pre-training of Large V…
|
|
2024-11-21
|
|
InternImage-DCNv3-G (M3I Pre-training)
|
InternImage: Exploring Large-Scale Vision Foundat…
|
|
2022-11-10
|
|