Bala-Copa

Name: Bala-Copa
License: Unknown

Balanced-COPA

Dataset Information

License

Unknown

Contents

Overview
Associated Benchmarks
Recent Benchmark Submissions
Research Papers

Overview

The Balanced Choice of Plausible Alternatives dataset is a benchmark for training machine learning models that are robust to superficial cues/spurious correlations. The dataset extends the COPA dataset(Roemmele et al. 2011) with mirrored instances that mitigate against token-level superficial cues in the original COPA answers. The superficial cues in the original COPA datasets result from an unbalanced token distribution between the correct and the incorrect answer choices, i.e., some tokens appear more in the correct choices than the incorrect ones. Balanced COPA equalizes the token distribution by adding mirrored instances with identical answer choices but different labels. The details about the creation of Balanced COPA and the implementation of the baselines are available in the paper.

Variants: Bala-Copa

Associated Benchmarks

This dataset is used in 1 benchmark:

Text Classification - Metrics: Accuracy

Recent Benchmark Submissions

Task	Model	Paper	Date
Text Classification	Qwen2.5-32B + CAPO	CAPO: Cost-Aware Prompt Optimization	2025-04-22
Text Classification	Llama-3.3-70B + CAPO	CAPO: Cost-Aware Prompt Optimization	2025-04-22
Text Classification	Mistral-Small-24B + CAPO	CAPO: Cost-Aware Prompt Optimization	2025-04-22

Research Papers

Recent papers with results on this dataset:

CAPO: Cost-Aware Prompt Optimization (2025) -

External Links:

Papers with Code Entry

Bala-Copa

Overview edit

Associated Benchmarks

Recent Benchmark Submissions

Research Papers

Edit Dataset Information

Overview