Tox21

Dataset Information
License
Unknown
Homepage

Overview

The Tox21 data set comprises 12,060 training samples and 647 test samples that represent chemical compounds. There are 801 "dense features" that represent chemical descriptors, such as molecular weight, solubility or surface area, and 272,776 "sparse features" that represent chemical substructures (ECFP10, DFS6, DFS8; stored in Matrix Market Format ). Machine learning methods can either use sparse or dense data or combine them. For each sample there are 12 binary labels that represent the outcome (active/inactive) of 12 different toxicological experiments. Note that the label matrix contains many missing values (NAs). The original data source and Tox21 challenge site is https://tripod.nih.gov/tox21/challenge/.

Source: Tox21 Machine Learning Data Set
Image Source: https://www.frontiersin.org/articles/10.3389/fenvs.2015.00080/full

Variants: Tox21, Tox21

Associated Benchmarks

This dataset is used in 3 benchmarks:

Recent Benchmark Submissions

Task Model Paper Date
Molecular Property Prediction S-CGIB Pre-training Graph Neural Networks on … 2025-02-20
Drug Discovery TrimNet + Perforated Backpropagation Perforated Backpropagation: A Neuroscience Inspired … 2025-01-29
Graph Classification G-Tuning Fine-tuning Graph Neural Networks by … 2023-12-21
Drug Discovery elEmBERT-V1 Structure to Property: Chemical Element … 2023-09-17
Drug Discovery GIT-Mol(G+S) GIT-Mol: A Multi-modal Large Language … 2023-08-14
Molecular Property Prediction MolXPT MolXPT: Wrapping Molecules with Text … 2023-05-18
Molecular Property Prediction GAL 125M Galactica: A Large Language Model … 2022-11-16
Molecular Property Prediction Uni-Mol Galactica: A Large Language Model … 2022-11-16
Molecular Property Prediction GAL 120B Galactica: A Large Language Model … 2022-11-16
Molecular Property Prediction GAL 30B Galactica: A Large Language Model … 2022-11-16
Molecular Property Prediction GAL 6.7B Galactica: A Large Language Model … 2022-11-16
Molecular Property Prediction GAL 1.3B Galactica: A Large Language Model … 2022-11-16
Graph Classification GTOT-Tuning Fine-Tuning Graph Neural Networks via … 2022-03-20
Molecular Property Prediction ChemRL-GEM ChemRL-GEM: Geometry Enhanced Molecular Representation … 2021-06-11
Graph Classification GMT Accurate Learning of Graph Representations … 2021-02-23
Molecular Property Prediction GROVER (base) Self-Supervised Graph Transformer on Large-Scale … 2020-06-18
Molecular Property Prediction GROVER (large) Self-Supervised Graph Transformer on Large-Scale … 2020-06-18
Molecular Property Prediction Autogluon AutoGluon-Tabular: Robust and Accurate AutoML … 2020-03-13
Drug Discovery SSVAE with multiple SMILES All SMILES Variational Autoencoder 2019-05-30
Molecular Property Prediction PretrainGNN Strategies for Pre-training Graph Neural … 2019-05-29

Research Papers

Recent papers with results on this dataset: