TAT

Taiwanese Across Taiwan

Dataset Information
Modalities
Speech
License
Homepage

Overview

Taiwanese Across Taiwan (TAT) corpus is a Large-Scale database of Native Taiwanese Article/Reading Speech collected across Taiwan. This corpus contains native Taiwanese speech of various accent across Taiwan. The corpus is annotated twice for use in voice recognition research. The corpus contains recording from 100 native speakers, each with length of 30 minutes making a total of 100 hours of speech data.

Source: https://sites.google.com/speech.ntut.edu.tw/fsw/home/tat-corpus?authuser=0

Image Source: https://sites.google.com/speech.ntut.edu.tw/fsw/home/tat-corpus?authuser=0

Variants: TAT

Associated Benchmarks

This dataset is used in 1 benchmark:

Recent Benchmark Submissions

Task Model Paper Date
Speech-to-Speech Translation Hokkien→En (Two-pass decoding) Speech-to-Speech Translation For A Real-world … 2022-10-19
Speech-to-Speech Translation Hokkien→En (Two-stage) Speech-to-Speech Translation For A Real-world … 2022-10-19
Speech-to-Speech Translation Hokkien→En (Three-stage) Speech-to-Speech Translation For A Real-world … 2022-10-19
Speech-to-Speech Translation Hokkien→En (Single-pass decoding) Speech-to-Speech Translation For A Real-world … 2022-10-19
Speech-to-Speech Translation En→Hokkien (Two-pass decoding) Speech-to-Speech Translation For A Real-world … 2022-10-19
Speech-to-Speech Translation En→Hokkien (Three-stage) Speech-to-Speech Translation For A Real-world … 2022-10-19
Speech-to-Speech Translation En→Hokkien (Two-stage) Speech-to-Speech Translation For A Real-world … 2022-10-19
Speech-to-Speech Translation En→Hokkien (Single-pass decoding) Speech-to-Speech Translation For A Real-world … 2022-10-19

Research Papers

Recent papers with results on this dataset: