WebSRC

WebSRC: A Dataset for Web-Based Structural Reading Comprehension

Dataset Information
Modalities
Images, Texts, Tables
Languages
English
Introduced
2021
License
Homepage

Overview

WebSRC is a novel Web-based Structural Reading Comprehension dataset. It consists of 0.44M question-answer pairs, which are collected from 6.5K web pages with corresponding HTML source code, screenshots and metadata. Each question in WebSRC requires a certain structural understanding of a web page to answer, and the answer is either a text span on the web page or yes/no.

Source: WebSRC Homepage

Variants: WebSRC

Associated Benchmarks

This dataset is used in 2 benchmarks:

Recent Benchmark Submissions

Task Model Paper Date
Question Answering ChatGPT 3.5 SpatialFormat LAPDoc: Layout-Aware Prompting for Documents 2024-02-15
Visual Question Answering (VQA) DUBLIN DUBLIN -- Document Understanding By … 2023-05-23

Research Papers

Recent papers with results on this dataset: