The MIT-States dataset has 245 object classes, 115 attribute classes and ∼53K images. There is a wide range of objects (e.g., fish, persimmon, room) and attributes (e.g., mossy, deflated, dirty). On average, each object instance is modified by one of the 9 attributes it affords.
Source: Attributes as Operators: Factorizing Unseen Attribute-Object Compositions
Image Source: http://web.mit.edu/phillipi/Public/states_and_transformations/index.html
Variants: MIT-States, MIT-States, generalized split
This dataset is used in 2 benchmarks:
Task | Model | Paper | Date |
---|---|---|---|
Zero-Shot Learning | CZSL | LOCL: Learning Object-Attribute Composition using … | 2022-10-07 |
Image Retrieval with Multi-Modal Query | ComposeAE | Compositional Learning of Image-Text Query … | 2020-06-19 |
Image Retrieval with Multi-Modal Query | TIRG | Composing Text and Image for … | 2018-12-18 |
Image Retrieval with Multi-Modal Query | Attribute as Operator | Attributes as Operators: Factorizing Unseen … | 2018-03-27 |
Image Retrieval with Multi-Modal Query | FiLM | FiLM: Visual Reasoning with a … | 2017-09-22 |
Image Retrieval with Multi-Modal Query | Show and Tell | Show and Tell: A Neural … | 2014-11-17 |
Recent papers with results on this dataset: