ML Research Wiki / Benchmarks / Speech Recognition / AISHELL-1

AISHELL-1

Speech Recognition Benchmark

Performance Over Time

📊 Showing 18 results | 📏 Metric: Word Error Rate (WER)

Top Performing Models

Rank	Model	Paper	Word Error Rate (WER)	Date	Code
1	Att	End-to-end Speech Recognition with Adaptive Computation Steps	18.70	2018-08-30	-
2	CTC/Att	A Comparative Study on Transformer vs RNN in Speech Applications	6.70	2019-09-13	📦 espnet/espnet 📦 MindSpore-scientific-2/code-11
3	BRA-E	Beyond Universal Transformer: block reusing with adaptor in Transformer for automatic speech recognition	6.63	2023-03-23	-
4	CTC-CRF 4gram-LM	CAT: A CTC-CRF based ASR Toolkit Bridging the Hybrid and the End-to-end Approaches towards Data Efficiency and Low Latency	6.34	2020-05-27	📦 thu-spmi/cat
5	BAT	BAT: Boundary aware transducer for memory-efficient and low-latency ASR	4.97	2023-05-19	📦 alibaba-damo-academy/FunASR
6	Paraformer	FunASR: A Fundamental End-to-End Speech Recognition Toolkit	4.95	2023-05-18	📦 alibaba-damo-academy/FunASR
7	U2	Unified Streaming and Non-streaming Two-pass End-to-end Model for Speech Recognition	4.72	2020-12-10	📦 PaddlePaddle/PaddleSpeech 📦 TeaPoly/Conformer-Athena 📦 xianchao-wu/wenet-deep-sparse-conformer 📦 joseewei/wenet 📦 Vill-Lab/2023-TMM-Grad-SAS
8	UMA	Unimodal Aggregation for CTC-based Speech Recognition	4.70	2023-09-15	📦 Audio-WestlakeU/UMA-ASR
9	Lightweight Transducer	Lightweight Transducer Based on Frame-Level Criterion	4.31	2024-09-05	📦 wangmengzhi/Lightweight-Transducer
10	SE-WSBO With LM	Improving Mandarin Speech Recogntion with Block-augmented Transformer	4.10	2022-07-24	📦 LeonWlw/asr_blockformer 📦 mininglamp-technology/asr-blockformer

All Papers (18)

End-to-end Speech Recognition with Adaptive Computation Steps

2018

Att

A Comparative Study on Transformer vs RNN in Speech Applications

2019

CTC/Att

espnet/espnet MindSpore-scientific-2/code-11

Beyond Universal Transformer: block reusing with adaptor in Transformer for automatic speech recognition

2023

BRA-E

CAT: A CTC-CRF based ASR Toolkit Bridging the Hybrid and the End-to-end Approaches towards Data Efficiency and Low Latency

2020

CTC-CRF 4gram-LM

thu-spmi/cat

BAT: Boundary aware transducer for memory-efficient and low-latency ASR

2023

BAT

alibaba-damo-academy/FunASR

FunASR: A Fundamental End-to-End Speech Recognition Toolkit

2023

Paraformer

alibaba-damo-academy/FunASR

Unified Streaming and Non-streaming Two-pass End-to-end Model for Speech Recognition

2020

U2

PaddlePaddle/PaddleSpeech TeaPoly/Conformer-Athena

Unimodal Aggregation for CTC-based Speech Recognition

2023

UMA

Audio-WestlakeU/UMA-ASR

Lightweight Transducer Based on Frame-Level Criterion

2024

Lightweight Transducer

wangmengzhi/Lightweight-Transducer

Improving Mandarin Speech Recogntion with Block-augmented Transformer

2022

SE-WSBO With LM

LeonWlw/asr_blockformer mininglamp-technology/asr-blockformer

Knowledge Transfer from Pre-trained Language Models to Cif-based Speech Recognizers via Hierarchical Distillation

2023

CIF-HKD With LM

MingLunHan/CIF-PyTorch minglunhan/cif-hieradist

Lightweight Transducer Based on Frame-Level Criterion

2024

Lightweight Transducer With LM

wangmengzhi/Lightweight-Transducer

CR-CTC: Consistency regularization on CTC for improved speech recognition

2024

Zipformer+CR-CTC (no external language model)

k2-fsa/icefall

FunASR: A Fundamental End-to-End Speech Recognition Toolkit

2023

Paraformer-large

alibaba-damo-academy/FunASR

MMSpeech: Multi-modal Multi-task Encoder-Decoder Pre-training for Speech Recognition

2022

MMSpeech With LM

ofa-sys/ofa

Qwen-Audio: Advancing Universal Audio Understanding via Unified Large-Scale Audio-Language Models

2023

Qwen-Audio

alibaba-damo-academy/FunASR qwenlm/qwen-audio

Seed-ASR: Understanding Diverse Speech and Contexts with LLM-based Speech Recognition

2024

Seed-ASR

FireRedASR: Open-Source Industrial-Grade Mandarin Speech Recognition Models from Encoder-Decoder to LLM Integration

2025

FireRedASR-AED

fireredteam/fireredasr

AISHELL-1

Performance Over Time

Edit Benchmark Results

Edit Result

Top Performing Models

All Papers (18)

End-to-end Speech Recognition with Adaptive Computation Steps

A Comparative Study on Transformer vs RNN in Speech Applications

Beyond Universal Transformer: block reusing with adaptor in Transformer for automatic speech recognition

CAT: A CTC-CRF based ASR Toolkit Bridging the Hybrid and the End-to-end Approaches towards Data Efficiency and Low Latency

BAT: Boundary aware transducer for memory-efficient and low-latency ASR

FunASR: A Fundamental End-to-End Speech Recognition Toolkit

Unified Streaming and Non-streaming Two-pass End-to-end Model for Speech Recognition

Unimodal Aggregation for CTC-based Speech Recognition

Lightweight Transducer Based on Frame-Level Criterion

Improving Mandarin Speech Recogntion with Block-augmented Transformer

Knowledge Transfer from Pre-trained Language Models to Cif-based Speech Recognizers via Hierarchical Distillation

Lightweight Transducer Based on Frame-Level Criterion

CR-CTC: Consistency regularization on CTC for improved speech recognition

FunASR: A Fundamental End-to-End Speech Recognition Toolkit

MMSpeech: Multi-modal Multi-task Encoder-Decoder Pre-training for Speech Recognition

Qwen-Audio: Advancing Universal Audio Understanding via Unified Large-Scale Audio-Language Models

Seed-ASR: Understanding Diverse Speech and Contexts with LLM-based Speech Recognition

FireRedASR: Open-Source Industrial-Grade Mandarin Speech Recognition Models from Encoder-Decoder to LLM Integration

Model	Paper	Word Error Rate (WER)	Date
Att	End-to-end Speech Recognition with Adaptive Compu…	18.70	2018-08-30
CTC/Att	A Comparative Study on Transformer vs RNN in Spee…	6.70	2019-09-13
BRA-E	Beyond Universal Transformer: block reusing with …	6.63	2023-03-23
CTC-CRF 4gram-LM	CAT: A CTC-CRF based ASR Toolkit Bridging the Hyb…	6.34	2020-05-27
BAT	BAT: Boundary aware transducer for memory-efficie…	4.97	2023-05-19
Paraformer	FunASR: A Fundamental End-to-End Speech Recogniti…	4.95	2023-05-18
U2	Unified Streaming and Non-streaming Two-pass End-…	4.72	2020-12-10
UMA	Unimodal Aggregation for CTC-based Speech Recogni…	4.70	2023-09-15
Lightweight Transducer	Lightweight Transducer Based on Frame-Level Crite…	4.31	2024-09-05
SE-WSBO With LM	Improving Mandarin Speech Recogntion with Block-a…	4.10	2022-07-24
CIF-HKD With LM	Knowledge Transfer from Pre-trained Language Mode…	4.10	2023-01-30
Lightweight Transducer With LM	Lightweight Transducer Based on Frame-Level Crite…	4.03	2024-09-05
Zipformer+CR-CTC (no external language model)	CR-CTC: Consistency regularization on CTC for imp…	4.02	2024-10-07
Paraformer-large	FunASR: A Fundamental End-to-End Speech Recogniti…	1.95	2023-05-18
MMSpeech With LM	MMSpeech: Multi-modal Multi-task Encoder-Decoder …	1.90	2022-11-29
Qwen-Audio	Qwen-Audio: Advancing Universal Audio Understandi…	1.29	2023-11-14
Seed-ASR	Seed-ASR: Understanding Diverse Speech and Contex…	0.68	2024-07-05
FireRedASR-AED	FireRedASR: Open-Source Industrial-Grade Mandarin…	0.55	2025-01-24