WebSRC: A Dataset for Web-Based Structural Reading Comprehension
WebSRC is a novel Web-based Structural Reading Comprehension dataset. It consists of 0.44M question-answer pairs, which are collected from 6.5K web pages with corresponding HTML source code, screenshots and metadata. Each question in WebSRC requires a certain structural understanding of a web page to answer, and the answer is either a text span on the web page or yes/no.
Source: WebSRC Homepage
Variants: WebSRC
This dataset is used in 2 benchmarks:
Task | Model | Paper | Date |
---|---|---|---|
Question Answering | ChatGPT 3.5 SpatialFormat | LAPDoc: Layout-Aware Prompting for Documents | 2024-02-15 |
Visual Question Answering (VQA) | DUBLIN | DUBLIN -- Document Understanding By … | 2023-05-23 |
Recent papers with results on this dataset: