Oracle-MNIST

Oracle-MNIST: a Realistic Image Dataset for Benchmarking Machine Learning Algorithms

Dataset Information
Modalities
Images
Languages
English
Introduced
2022
License
Unknown
Homepage

Overview

We introduce the Oracle-MNIST dataset, comprising of 2828 grayscale images of 30,222 ancient characters from 10 categories, for benchmarking pattern classification, with particular challenges on image noise and distortion. The training set totally consists of 27,222 images, and the test set contains 300 images per class. Oracle-MNIST shares the same data format with the original MNIST dataset, allowing for direct compatibility with all existing classifiers and systems, but it constitutes a more challenging classification task than MNIST. The images of ancient characters suffer from 1) extremely serious and unique noises caused by three-thousand years of burial and aging and 2) dramatically variant writing styles by ancient Chinese, which all make them realistic for machine learning research. The dataset is freely available at https://github.com/wm-bupt/oracle-mnist.

Variants: Oracle-MNIST

Associated Benchmarks

This dataset is used in 1 benchmark:

Recent Benchmark Submissions

Task Model Paper Date
Image Classification ResNet-18 + Vision Eagle Attention Vision Eagle Attention: a new … 2024-11-15
Image Classification ResNet-18 Vision Eagle Attention: a new … 2024-11-15
Image Classification R-ExplaiNet-26 Learning local discrete features in … 2024-10-31
Image Classification LR-Net LR-Net: A Block-based Convolutional Neural … 2022-07-19

Research Papers

Recent papers with results on this dataset: