The D4LA dataset is a diverse benchmark for document layout analysis (DLA) derived from the RVL-CDIP dataset. It focuses on 12 document types with rich layouts, each represented by approximately 1,000 manually annotated images, while filtering out noisy, handwritten, artistic, or text-scarce images. The dataset defines 27 detailed layout categories, including DocTitle, ListText, Header, Table, Equation, and Footer, among others, catering to real-world applications.
Variants: D4LA
This dataset is used in 1 benchmark:
Task | Model | Paper | Date |
---|---|---|---|
Document Layout Analysis | DoPTA | DoPTA: Improving Document Layout Analysis … | 2024-12-17 |
Document Layout Analysis | DocLayout-YOLO | DocLayout-YOLO: Enhancing Document Layout Analysis … | 2024-10-16 |
Document Layout Analysis | VGT | Vision Grid Transformer for Document … | 2023-08-29 |
Recent papers with results on this dataset: