How2

Name: How2
Published: 2018-01-01
License: CC BY-SA 4.0

Dataset Information

Modalities

Videos, Texts, Audio

Languages

English, Portuguese

Introduced

2018

License

CC BY-SA 4.0

Homepage

Official Website

Contents

Overview
Associated Benchmarks
Recent Benchmark Submissions
Research Papers

Overview

The How2 dataset contains 13,500 videos, or 300 hours of speech, and is split into 185,187 training, 2022 development (dev), and 2361 test utterances. It has subtitles in English and crowdsourced Portuguese translations.

Source: exploring multiview correlations in open-domain videos

Variants: How2, How2 300h

Associated Benchmarks

This dataset is used in 1 benchmark:

Text Summarization - Metrics: Content F1, ROUGE-L, ROUGE-1

Recent Benchmark Submissions

Task	Model	Paper	Date
Text Summarization	BertSum	Abstractive Summarization of Spoken and …	2020-08-21
Text Summarization	Ground-truth transcript + Action with Hierarchical Attn	Multimodal Abstractive Summarization for How2 …	2019-06-19

Research Papers

Recent papers with results on this dataset:

External Links:

How2

Overview edit

Associated Benchmarks

Recent Benchmark Submissions

Research Papers

Edit Dataset Information

Overview