← ML Research Wiki / 2303.17580

HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in Hugging Face

Yongliang Shen Microsoft Research Asia, Kaitao Song Microsoft Research Asia, Xu Tan Microsoft Research Asia, Dongsheng Li Microsoft Research Asia, Weiming Lu Microsoft Research Asia, Yueting Zhuang [email protected] Microsoft Research Asia, Zhejiang University Microsoft Research Asia, Microsoft Research Microsoft Research Asia (2023)

Paper Information

arXiv ID

2303.17580

Venue

Neural Information Processing Systems

Domain

Artificial Intelligence, Machine Learning, Natural Language Processing, Computer Vision, Speech

SOTA Claim

Yes

Reproducibility

8/10

Contents

Abstract
Methods
Datasets
Results
Limitations
Related Work
External Resources

Abstract

Solving complicated AI tasks with different domains and modalities is a key step toward artificial general intelligence.While there are numerous AI models available for various domains and modalities, they cannot handle complicated AI tasks autonomously.Considering large language models (LLMs) have exhibited exceptional abilities in language understanding, generation, interaction, and reasoning, we advocate that LLMs could act as a controller to manage existing AI models to solve complicated AI tasks, with language serving as a generic interface to empower this.Based on this philosophy, we present HuggingGPT, an LLM-powered agent that leverages LLMs (e.g., ChatGPT) to connect various AI models in machine learning communities (e.g., Hugging Face) to solve AI tasks.Specifically, we use ChatGPT to conduct task planning when receiving a user request, select models according to their function descriptions available in Hugging Face, execute each subtask with the selected AI model, and summarize the response according to the execution results.By leveraging the strong language capability of ChatGPT and abundant AI models in Hugging Face, HuggingGPT can tackle a wide range of sophisticated AI tasks spanning different modalities and domains and achieve impressive results in language, vision, speech, and other challenging tasks, which paves a new way towards the realization of artificial general intelligence.

Summary

This paper presents HuggingGPT, an agent that utilizes large language models (LLMs), particularly ChatGPT, to manage and execute complex AI tasks by leveraging expert models from machine learning communities like Hugging Face. HuggingGPT acts as a controller to perform task planning, model selection, task execution, and response generation, thus integrating multimodal capabilities in language, vision, and speech. The authors explore the challenges of LLMs when coordinating multiple expert models and propose HuggingGPT as a solution to address these challenges while demonstrating its effectiveness through extensive experiments on various tasks across different domains.

Methods

This paper employs the following methods:

HuggingGPT

Models Used

ChatGPT
gpt-3.5-turbo
text-davinci-003
gpt-4

Datasets

The following datasets were used in this research:

None specified

Evaluation Metrics

F1
Accuracy
GPT-4 Score
Normalized Edit Distance

Results

HuggingGPT demonstrates the ability to manage and execute complex AI tasks effectively through integration of LLMs with expert models.
Extensive experiments indicate significant potential in HuggingGPT for multitasking across language, vision, and speech domains.

Limitations

The authors identified the following limitations:

Relies heavily on LLM capabilities for planning; feasibility and optimality of plans cannot always be ensured.
Challenges in efficiency due to multiple interactions with LLMs which increases response times.
Maximum token lengths can limit the ability to connect numerous models.
Instability issues from LLMs potentially failing to conform to instructions.

Technical Requirements

Number of GPUs: None specified
GPU Type: None specified

Keywords

Large Language Models Hugging Face ChatGPT AI Task Automation Multimodal AI Autonomous Agents

Papers Using Similar Methods

Multi-task Prompt Words Learning for Social Media Content Generation (2024)

External Resources

Funding: Not specified
References: 44
Influential Citations: 63

HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in Hugging Face

Abstract edit

Summary

Methods add

Models Used add

Datasets add

Evaluation Metrics add

Results add

Limitations add

Technical Requirements edit

Keywords add

Related Papers