← ML Research Wiki / 2404.14219

Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone Microsoft

(2024)

Paper Information

arXiv ID

2404.14219

Venue

arXiv.org

Domain

artificial intelligence, natural language processing, multimodal learning

SOTA Claim

Yes

Reproducibility

6/10

Contents

Abstract
Methods
Datasets
Results
Limitations
Related Work
External Resources

Abstract

We introduce phi-3-mini, a 3.8 billion parameter language model trained on 3.3 trillion tokens, whose overall performance, as measured by both academic benchmarks and internal testing, rivals that of models such as Mixtral 8x7B and GPT-3.5 (e.g., phi-3-mini achieves 69% on MMLU and 8.38 on MT-bench), despite being small enough to be deployed on a phone.Our training dataset is a scaled-up version of the one used for phi-2, composed of heavily filtered publicly available web data and synthetic data.The model is also further aligned for robustness, safety, and chat format.We also provide parameter-scaling results with a 7B, 14B models trained for 4.8T tokens, called phi-3-small, phi-3-medium, both significantly more capable than phi-3-mini (e.g., respectively 75%, 78% on MMLU, and 8.7, 8.9 on MT-bench).To enhance multilingual, multimodal, and long-context capabilities, we introduce three models in the phi-3.5 series: phi-3.5-mini,phi-3.5-MoE,and phi-3.5-Vision.The phi-3.5-MoE, a 16 x 3.8B MoE model with 6.6 billion active parameters, achieves superior performance in language reasoning, math, and code tasks compared to other open-source models of similar scale, such as Llama 3.1 and the Mixtral series, and on par with Gemini-1.5-Flashand GPT-4o-mini.Meanwhile, phi-3.5-Vision, a 4.2 billion parameter model derived from phi-3.5mini, excels in reasoning tasks and is adept at handling both single-image and text prompts, as well as multi-image and text prompts.

Summary

The Phi-3 Technical Report presents the phi-3-mini language model, a 3.8 billion parameter model trained on 3.3 trillion tokens, demonstrating performance comparable to larger models like Mixtral 8x7B and GPT-3.5. It describes enhancements in multilingual, multimodal, and long-context capabilities through the introduction of the phi-3.5 series. Methodologies involve a unique training dataset derived from curated web data and synthetic input, focusing on optimizing performance for smaller models while aligning with principles of robustness and safety. The report outlines the training and evaluation processes, including specific metrics achieved on various benchmarks, and reflects on the impacts of model size on task performance and factual knowledge retention.

Methods

This paper employs the following methods:

Transformer
MoE

Models Used

phi-3-mini
phi-3-small
phi-3-medium
phi-3.5-mini
phi-3.5-MoE
phi-3.5-Vision
Mixtral 8x7B
GPT-3.5
Llama 3.1
Gemini-1.5-Flash
GPT-4o-mini

Datasets

The following datasets were used in this research:

None specified

Evaluation Metrics

MMLU
MT-bench
Accuracy

Results

phi-3-mini achieves 69% on MMLU
phi-3-mini achieves 8.38 on MT-bench
phi-3-small achieves 75% on MMLU
phi-3-medium achieves 78% on MMLU
phi-3.5-MoE outperforms open-source models of similar scale
phi-3.5-Vision excels in reasoning tasks

Limitations

The authors identified the following limitations:

Not specified

Technical Requirements

Number of GPUs: None specified
GPU Type: None specified

Keywords

transformer models chat models multilingual models long-context modeling vision-language models

Papers Using Similar Methods

External Resources

Funding: Not specified
References: 64
Influential Citations: 140

Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone Microsoft

Abstract edit

Summary

Methods add

Models Used add

Datasets add

Evaluation Metrics add

Results add

Limitations add

Technical Requirements edit

Keywords add

Related Papers