FUNSD

Form Understanding in Noisy Scanned Documents

Dataset Information
Modalities
Images, Texts
License
Homepage

Overview

Form Understanding in Noisy Scanned Documents (FUNSD) comprises 199 real, fully annotated, scanned forms. The documents are noisy and vary widely in appearance, making form understanding (FoUn) a challenging task. The proposed dataset can be used for various tasks, including text detection, optical character recognition, spatial layout analysis, and entity labeling/linking.

Source: FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents

Image source: https://guillaumejaume.github.io/FUNSD/

Variants: FUNSD

Associated Benchmarks

This dataset is used in 3 benchmarks:

Recent Benchmark Submissions

Task Model Paper Date
Entity Linking RORE (GeoLayoutLM) Modeling Layout Reading Order as … 2024-09-29
Relation Extraction RORE (GeoLayoutLM) Modeling Layout Reading Order as … 2024-09-29
Semantic entity labeling RORE (GeoLayoutLM) Modeling Layout Reading Order as … 2024-09-29
Relation Extraction LayoutLMv3 large EM + BBO + RSF A LayoutLMv3-Based Model for Enhanced … 2024-04-16
Semantic entity labeling TPP (LayoutMask) Reading Order Matters: Information Extraction … 2023-10-17
Entity Linking TPP (LayoutMask) Reading Order Matters: Information Extraction … 2023-10-17
Relation Extraction TPP (LayoutMask) Reading Order Matters: Information Extraction … 2023-10-17
Semantic entity labeling DocTr DocTr: Document Transformer for Structured … 2023-07-16
Entity Linking DocTr DocTr: Document Transformer for Structured … 2023-07-16
Semantic entity labeling LayoutMask (large) LayoutMask: Enhance Text-Layout Interaction in … 2023-05-30
Semantic entity labeling LayoutMask (base) LayoutMask: Enhance Text-Layout Interaction in … 2023-05-30
Entity Linking GeoLayoutLM GeoLayoutLM: Geometric Pre-training for Visual … 2023-04-21
Semantic entity labeling GeoLayoutLM GeoLayoutLM: Geometric Pre-training for Visual … 2023-04-21
Relation Extraction LayoutLMv3 large GeoLayoutLM: Geometric Pre-training for Visual … 2023-04-21
Relation Extraction GeoLayoutLM GeoLayoutLM: Geometric Pre-training for Visual … 2023-04-21
Semantic entity labeling StrucTexTv2 (large) StrucTexTv2: Masked Visual-Textual Prediction for … 2023-03-01
Semantic entity labeling StrucTexTv2 (small) StrucTexTv2: Masked Visual-Textual Prediction for … 2023-03-01
Semantic entity labeling ERNIE-Layoutlarge ERNIE-Layout: Layout Knowledge Enhanced Pre-training … 2022-10-12
Semantic entity labeling XDoc1M XDoc: Unified Pre-training for Cross-Format … 2022-10-06
Entity Linking Doc2Graph Doc2Graph: a Task Agnostic Document … 2022-08-23

Research Papers

Recent papers with results on this dataset: