← ML Research Wiki / 2402.10373

BioMistral: A Collection of Open-Source Pretrained Large Language Models for Medical Domains

(2024)

Paper Information

arXiv ID

2402.10373

Venue

Annual Meeting of the Association for Computational Linguistics

Contents

Abstract
Methods
Datasets
Results
Limitations
Related Work
External Resources

Abstract

Large Language Models (LLMs) have demonstrated remarkable versatility in recent years, offering potential applications across specialized domains such as healthcare and medicine.Despite the availability of various open-source LLMs tailored for health contexts, adapting general-purpose LLMs to the medical domain presents significant challenges.In this paper, we introduce BioMistral, an open-source LLM tailored for the biomedical domain, utilizing Mistral as its foundation model and further pre-trained on PubMed Central.We conduct a comprehensive evaluation of BioMistral on a benchmark comprising 10 established medical question-answering (QA) tasks in English.We also explore lightweight models obtained through quantization and model merging approaches.Our results demonstrate BioMistral's superior performance compared to existing open-source medical models and its competitive edge against proprietary counterparts.Finally, to address the limited availability of data beyond English and to assess the multilingual generalization of medical LLMs, we automatically translated and evaluated this benchmark into 7 other languages.This marks the first large-scale multilingual evaluation of LLMs in the medical domain.Datasets, multilingual evaluation benchmarks, scripts, and all the models obtained during our experiments are freely released.

Summary

The paper introduces BioMistral, an open-source large language model (LLM) specifically designed for the biomedical domain. Built upon the Mistral 7B Instruct model, BioMistral is further pre-trained on PubMed Central, demonstrating superior performance across a benchmark of 10 established medical question-answering tasks in English. The study explores lightweight model variants achieved via quantization and model merging techniques, showcasing BioMistral's competitive edge against both open-source and proprietary medical models. Additionally, the paper addresses the challenges of multilingual evaluation by translating and benchmarking BioMistral's capabilities in 7 languages for the first time in the medical LLM domain.

Methods

This paper employs the following methods:

Mistral 7B Instruct
Few-shot learning
Supervised fine-tuning
Model merging
Quantization

Models Used

BioMistral 7B
Mistral 7B Instruct
MedPaLM-2
GPT-4
BioGPT
ClinicalGPT
MedAlpaca
PMC-LLaMA
MediTron-7B

Datasets

The following datasets were used in this research:

PubMed Central
MedQA
MMLU

Evaluation Metrics

Accuracy
Expected Calibration Error (ECE)

Results

BioMistral 7B shows superior performance on medical QA tasks compared to existing open-source biomedical models.
Introduces a benchmark for multilingual evaluation of LLMs in the medical domain.
BioMistral demonstrates improvements in few-shot learning and fine-tuning scenarios.

Limitations

The authors identified the following limitations:

Significant computational resources required for model training and evaluation.
Performance degradation in non-English contexts due to limited data availability.
Difficulty in general medical terminology not present in the training dataset.

Technical Requirements

Number of GPUs: 32
GPU Type: NVIDIA A100 80GB
Compute Requirements: 1.5 epochs within the 20-hour limit of Jean Zay HPC, batch size of 16, total batch size of 1024

Papers Using Similar Methods

External Resources

References: 69
Influential Citations: 32

BioMistral: A Collection of Open-Source Pretrained Large Language Models for Medical Domains

Abstract edit

Summary

Methods add

Models Used add

Datasets add

Evaluation Metrics add

Results add

Limitations add

Technical Requirements edit

Related Papers