EPHOIE

[email protected]

Dataset Information
Modalities
Images
Languages
Chinese
Introduced
2021
Homepage

Overview

EPHOIE is a fully-annotated dataset which is the first Chinese benchmark for both text spotting and visual information extraction. EPHOIE consists of 1,494 images of examination paper head with complex layouts and background, including a total of 15,771 Chinese handwritten or printed text instances.

Source: Towards Robust Visual Information Extraction in Real World: New Dataset and Novel Solution

Image source: https://github.com/HCIILAB/EPHOIE

Variants: EPHOIE

Associated Benchmarks

This dataset is used in 2 benchmarks:

Recent Benchmark Submissions

Task Model Paper Date
Document AI LayoutLMv3 LayoutLMv3: Pre-training for Document AI … 2022-04-18
Key Information Extraction LayoutLMv3 LayoutLMv3: Pre-training for Document AI … 2022-04-18

Research Papers

Recent papers with results on this dataset: