Bala-Copa

Balanced-COPA

Dataset Information
License
Unknown

Overview

The Balanced Choice of Plausible Alternatives dataset is a benchmark for training machine learning models that are robust to superficial cues/spurious correlations. The dataset extends the COPA dataset(Roemmele et al. 2011) with mirrored instances that mitigate against token-level superficial cues in the original COPA answers. The superficial cues in the original COPA datasets result from an unbalanced token distribution between the correct and the incorrect answer choices, i.e., some tokens appear more in the correct choices than the incorrect ones. Balanced COPA equalizes the token distribution by adding mirrored instances with identical answer choices but different labels. The details about the creation of Balanced COPA and the implementation of the baselines are available in the paper.

Variants: Bala-Copa

Associated Benchmarks

This dataset is used in 1 benchmark:

Recent Benchmark Submissions

Task Model Paper Date
Text Classification Qwen2.5-32B + CAPO CAPO: Cost-Aware Prompt Optimization 2025-04-22
Text Classification Llama-3.3-70B + CAPO CAPO: Cost-Aware Prompt Optimization 2025-04-22
Text Classification Mistral-Small-24B + CAPO CAPO: Cost-Aware Prompt Optimization 2025-04-22

Research Papers

Recent papers with results on this dataset: