SUT

Name: SUT
License: Unknown

SUT: a new multi-purpose synthetic dataset for Farsi document image analysis

Dataset Information

Languages

Persian

License

Unknown

Contents

Overview
Associated Benchmarks
Recent Benchmark Submissions
Research Papers

Overview

This paper introduces a new large-scale dataset for Farsi document images, named SUT, which aims to tackle the challenges associated with obtaining diverse and substantial ground-truth data for supervised models in document image analysis (DIA) tasks, like document image classification, text detection and recognition, and information retrieval. The dataset comprises 62,453 images that have been categorized into 21 distinct classes, including identity documents featuring synthetically generated personal information superimposed on various backgrounds. The dataset also includes corresponding files with labeling information for the images. The ground-truth data is organized in CSV files containing image file paths and associated information about the embedded data.

Variants: SUT

Associated Benchmarks

This dataset is used in 2 benchmarks:

Optical Character Recognition (OCR) - Metrics: Character Error Rate (CER)
Document Image Classification - Metrics: Accuracy

Recent Benchmark Submissions

No recent benchmark submissions available for this dataset.

Research Papers

No papers with results on this dataset found.

External Links:

Papers with Code Entry

SUT

Overview edit

Associated Benchmarks

Recent Benchmark Submissions

Research Papers

Edit Dataset Information

Overview