ADORE

A benchmark dataset for machine learning in ecotoxicology

Dataset Information
Modalities
Biology, Environment
Languages
English
Introduced
2023
License
Unknown
Homepage

Overview

ADORE is a benchmark dataset for machine learning for ecotixicology, covering acute aquatic toxicity in three relevant taxonomic groups (fish, crustaceans, and algae). The core dataset describes ecotoxicological experiments and is expanded with phylogenetic and species-specific data on the species as well as chemical properties and molecular representations. Apart from challenging other researchers to try and achieve the best model performances across the whole dataset, we propose specific relevant challenges on subsets of the data and include datasets and splittings corresponding to each of these challenge as well as in-depth characterization and discussion of train-test splitting approaches.

The dataset contains acute toxicity data (lethal concentration 50; LC50 or effective concentration 50; EC50) on 2,408 chemicals in 203 different species of algae, crustaceans, and fish. This encompasses a total of 33K data points, 26K of which are on fish (140 species).

The task is to predict the ecotoxicological outcome based on historic ecotoxicity data.

ADORE was originally published in Nature ScientificData:
Schür, Christoph, Lilian Gasser, Fernando Perez-Cruz, Kristin Schirmer, and Marco Baity-Jesi. 2023. “A Benchmark Dataset for Machine Learning in Ecotoxicology.” Scientific Data 10 (1): 718. https://doi.org/10.1038/s41597-023-02612-2.

Variants: ADORE

Associated Benchmarks

This dataset is used in 1 benchmark:

  • regression -

Recent Benchmark Submissions

No recent benchmark submissions available for this dataset.

Research Papers

No papers with results on this dataset found.