PLIP-RN50
|
PLIP: Language-Image Pre-training for Person Repr…
|
64.25
|
2023-05-15
|
|
VGSG (ViT-Base)
|
VGSG: Vision-Guided Semantic-Group Network for Te…
|
63.05
|
2023-11-13
|
|
SSAN
|
Semantically Self-Aligned Network for Text-to-Ima…
|
54.23
|
2021-07-27
|
|
MARS
|
MARS: Paying more attention to visual attributes …
|
44.93
|
2024-07-05
|
|
Filtering-WoRA(Small)
|
From Data Deluge to Data Curation: A Filtering-Wo…
|
42.60
|
2024-04-16
|
|
RaSa
|
RaSa: Relation and Sensitivity Aware Representati…
|
41.29
|
2023-05-23
|
|
APTM
|
Towards Unified Text-based Person Retrieval: A La…
|
41.22
|
2023-06-05
|
|
RDE
|
Noisy-Correspondence Learning for Text-to-Image P…
|
40.06
|
2023-08-19
|
|
CADA
|
Cross-Modal Adaptive Dual Association for Text-to…
|
39.85
|
2023-12-04
|
|
TBPS-CLIP (ViT-B/16)
|
An Empirical Study of CLIP for Text-based Person …
|
39.83
|
2023-08-19
|
|
IRRA
|
Cross-Modal Implicit Relation Reasoning and Align…
|
38.06
|
2023-03-22
|
|