Minigpt-4

Enhancing Vision-language Understanding with Advanced Large Language Models.

Minigpt-4 screenshot 1

Cost / License

  • Free
  • Open Source

Platforms

  • Online
  • Self-Hosted
-
No reviews
2likes
0comments
0news articles

Features

Suggest and vote on features
  1.  AI-Powered

Minigpt-4 News & Activities

Highlights All activities

Recent activities

Show all activities

Minigpt-4 information

  • Developed by

    Vision CAIR Research Group
  • Licensing

    Open Source (BSD-3-Clause) and Free product.
  • Written in

  • Alternatives

    31 alternatives listed
  • Supported Languages

    • English

AlternativeTo Category

AI Tools & Services

GitHub repository

  •  25,765 Stars
  •  2,928 Forks
  •  376 Open Issues
  •   Updated  
View on GitHub
Minigpt-4 was added to AlternativeTo by Paul on and this page was last updated .
No comments or reviews, maybe you want to be first?
Post comment/review

What is Minigpt-4?

Enhancing Vision-language Understanding with Advanced Large Language Models. We are currently preparing a lighter model runnable on a single 3090 GPU, which you will be able to run on your own machine. Please stay updated by visiting our GitHub page.

  • MiniGPT-4 aligns a frozen visual encoder from BLIP-2 with a frozen LLM, Vicuna, using just one projection layer.
  • We train MiniGPT-4 with two stages. The first traditional pretraining stage is trained using roughly 5 million aligned image-text pairs in 10 hours using 4 A100s. After the first stage, Vicuna is able to understand the image. But the generation ability of Vicuna is heavilly impacted.
  • To address this issue and improve usability, we propose a novel way to create high-quality image-text pairs by the model itself and ChatGPT together. Based on this, we then create a small (3500 pairs in total) yet high-quality dataset.
  • The second finetuning stage is trained on this dataset in a conversation template to significantly improve its generation reliability and overall usability. To our surprise, this stage is computationally efficient and takes only around 7 minutes with a single A100.
  • MiniGPT-4 yields many emerging vision-language capabilities similar to those demonstrated in GPT-4.

Minigpt-4 Videos

Official Links