TOMG-Bench

Name: TOMG-Bench
Published: 2024-12-19
License: Unknown

Text-based Open Molecule Generation Benchmark

Dataset Information

Modalities

Texts, Graphs

Languages

English

Introduced

2024

License

Unknown

Homepage

Official Website

Contents

Overview
Associated Benchmarks
Recent Benchmark Submissions
Research Papers

Overview

In this paper, we propose Text-based Open Molecule Generation Benchmark (TOMG-Bench), the first benchmark to evaluate the open-domain molecule generation capability of LLMs. TOMG-Bench encompasses a dataset of three major tasks: molecule editing (MolEdit), molecule optimization (MolOpt), and customized molecule generation (MolCustom). Each task further contains three subtasks, with each subtask comprising 5,000 test samples. Given the inherent complexity of open molecule generation, we have also developed an automated evaluation system that helps measure both the quality and the accuracy of the generated molecules. Our comprehensive benchmarking of 25 LLMs reveals the current limitations and potential areas for improvement in text-guided molecule discovery. Furthermore, with the assistance of OpenMolIns, a specialized instruction tuning dataset proposed for solving challenges raised by TOMG-Bench, Llama3.1-8B could outperform all the open-source general LLMs, even surpassing GPT-3.5-turbo by 46.5\% on TOMG-Bench.

Variants: TOMG-Bench

Associated Benchmarks

This dataset is used in 1 benchmark:

Description-guided molecule generation - Metrics: wAcc

Recent Benchmark Submissions

Task	Model	Paper	Date
Description-guided molecule generation	GPT-4-turbo	TOMG-Bench: Evaluating LLMs on Text-based …	2024-12-19
Description-guided molecule generation	GPT-4o	TOMG-Bench: Evaluating LLMs on Text-based …	2024-12-19
Description-guided molecule generation	Claude-3	TOMG-Bench: Evaluating LLMs on Text-based …	2024-12-19
Description-guided molecule generation	Llama-3.1-8B (OpenMolIns-large)	TOMG-Bench: Evaluating LLMs on Text-based …	2024-12-19
Description-guided molecule generation	Galactica-125M (OpenMolIns-xlarge)	TOMG-Bench: Evaluating LLMs on Text-based …	2024-12-19
Description-guided molecule generation	Llama3-70B-Instruct (INT4)	TOMG-Bench: Evaluating LLMs on Text-based …	2024-12-19
Description-guided molecule generation	Galactica-125M (OpenMolIns-large)	TOMG-Bench: Evaluating LLMs on Text-based …	2024-12-19
Description-guided molecule generation	Galactica-125M (OpenMolIns-medium)	TOMG-Bench: Evaluating LLMs on Text-based …	2024-12-19
Description-guided molecule generation	GPT-3.5-turbo	TOMG-Bench: Evaluating LLMs on Text-based …	2024-12-19
Description-guided molecule generation	Galactica-125M (OpenMolIns-small)	TOMG-Bench: Evaluating LLMs on Text-based …	2024-12-19
Description-guided molecule generation	Llama3.1-8B-Instruct	TOMG-Bench: Evaluating LLMs on Text-based …	2024-12-19
Description-guided molecule generation	Claude-3.5	TOMG-Bench: Evaluating LLMs on Text-based …	2024-12-19
Description-guided molecule generation	Gemini-1.5-pro	TOMG-Bench: Evaluating LLMs on Text-based …	2024-12-19
Description-guided molecule generation	Llama3-8B-Instruct	TOMG-Bench: Evaluating LLMs on Text-based …	2024-12-19
Description-guided molecule generation	chatglm-9B	TOMG-Bench: Evaluating LLMs on Text-based …	2024-12-19
Description-guided molecule generation	Galactica-125M (OpenMolIns-light)	TOMG-Bench: Evaluating LLMs on Text-based …	2024-12-19
Description-guided molecule generation	Llama3.2-1B (OpenMolIns-large)	TOMG-Bench: Evaluating LLMs on Text-based …	2024-12-19
Description-guided molecule generation	yi-1.5-9B	TOMG-Bench: Evaluating LLMs on Text-based …	2024-12-19
Description-guided molecule generation	Mistral-7B-Instruct-v0.2	TOMG-Bench: Evaluating LLMs on Text-based …	2024-12-19
Description-guided molecule generation	BioT5-base	TOMG-Bench: Evaluating LLMs on Text-based …	2024-12-19

Research Papers

Recent papers with results on this dataset:

TOMG-Bench: Evaluating LLMs on Text-based Open Molecule Generation (2024) -

External Links:

TOMG-Bench

Overview edit

Associated Benchmarks

Recent Benchmark Submissions

Research Papers

Edit Dataset Information

Overview