FUNSD

Name: FUNSD
License: Custom

Form Understanding in Noisy Scanned Documents

Dataset Information

Modalities

Images, Texts

License

Custom

Homepage

Official Website

Contents

Overview
Associated Benchmarks
Recent Benchmark Submissions
Research Papers

Overview

Form Understanding in Noisy Scanned Documents (FUNSD) comprises 199 real, fully annotated, scanned forms. The documents are noisy and vary widely in appearance, making form understanding (FoUn) a challenging task. The proposed dataset can be used for various tasks, including text detection, optical character recognition, spatial layout analysis, and entity labeling/linking.

Source: FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents

Image source: https://guillaumejaume.github.io/FUNSD/

Variants: FUNSD

Associated Benchmarks

This dataset is used in 3 benchmarks:

Relation Extraction - Metrics: F1
Entity Linking - Metrics: F1
Semantic entity labeling - Metrics: F1

Recent Benchmark Submissions

Task	Model	Paper	Date
Entity Linking	RORE (GeoLayoutLM)	Modeling Layout Reading Order as …	2024-09-29
Relation Extraction	RORE (GeoLayoutLM)	Modeling Layout Reading Order as …	2024-09-29
Semantic entity labeling	RORE (GeoLayoutLM)	Modeling Layout Reading Order as …	2024-09-29
Relation Extraction	LayoutLMv3 large EM + BBO + RSF	A LayoutLMv3-Based Model for Enhanced …	2024-04-16
Semantic entity labeling	TPP (LayoutMask)	Reading Order Matters: Information Extraction …	2023-10-17
Entity Linking	TPP (LayoutMask)	Reading Order Matters: Information Extraction …	2023-10-17
Relation Extraction	TPP (LayoutMask)	Reading Order Matters: Information Extraction …	2023-10-17
Semantic entity labeling	DocTr	DocTr: Document Transformer for Structured …	2023-07-16
Entity Linking	DocTr	DocTr: Document Transformer for Structured …	2023-07-16
Semantic entity labeling	LayoutMask (large)	LayoutMask: Enhance Text-Layout Interaction in …	2023-05-30
Semantic entity labeling	LayoutMask (base)	LayoutMask: Enhance Text-Layout Interaction in …	2023-05-30
Entity Linking	GeoLayoutLM	GeoLayoutLM: Geometric Pre-training for Visual …	2023-04-21
Semantic entity labeling	GeoLayoutLM	GeoLayoutLM: Geometric Pre-training for Visual …	2023-04-21
Relation Extraction	LayoutLMv3 large	GeoLayoutLM: Geometric Pre-training for Visual …	2023-04-21
Relation Extraction	GeoLayoutLM	GeoLayoutLM: Geometric Pre-training for Visual …	2023-04-21
Semantic entity labeling	StrucTexTv2 (large)	StrucTexTv2: Masked Visual-Textual Prediction for …	2023-03-01
Semantic entity labeling	StrucTexTv2 (small)	StrucTexTv2: Masked Visual-Textual Prediction for …	2023-03-01
Semantic entity labeling	ERNIE-Layoutlarge	ERNIE-Layout: Layout Knowledge Enhanced Pre-training …	2022-10-12
Semantic entity labeling	XDoc1M	XDoc: Unified Pre-training for Cross-Format …	2022-10-06
Entity Linking	Doc2Graph	Doc2Graph: a Task Agnostic Document …	2022-08-23

Research Papers

Recent papers with results on this dataset:

External Links:

FUNSD

Overview edit

Associated Benchmarks

Recent Benchmark Submissions

Research Papers

Edit Dataset Information

Overview