Composing Actions from Language and Vision
CALVIN (Composing Actions from Language and Vision), is an open-source simulated benchmark to learn long-horizon language-conditioned robot manipulation tasks.
Variants: CALVIN
This dataset is used in 2 benchmarks:
Recent papers with results on this dataset: