Minigpt-4
2 likes
Enhancing Vision-language Understanding with Advanced Large Language Models.
Cost / License
- Free
- Open Source
Platforms
- Online
- Self-Hosted
Features
- AI-Powered
Tags
Minigpt-4 News & Activities
Highlights All activities
Recent activities
- fsdftty added AddedAsAlternative
Minigpt-4 information
No comments or reviews, maybe you want to be first?
Post comment/reviewWhat is Minigpt-4?
Enhancing Vision-language Understanding with Advanced Large Language Models. We are currently preparing a lighter model runnable on a single 3090 GPU, which you will be able to run on your own machine. Please stay updated by visiting our GitHub page.
- MiniGPT-4 aligns a frozen visual encoder from BLIP-2 with a frozen LLM, Vicuna, using just one projection layer.
- We train MiniGPT-4 with two stages. The first traditional pretraining stage is trained using roughly 5 million aligned image-text pairs in 10 hours using 4 A100s. After the first stage, Vicuna is able to understand the image. But the generation ability of Vicuna is heavilly impacted.
- To address this issue and improve usability, we propose a novel way to create high-quality image-text pairs by the model itself and ChatGPT together. Based on this, we then create a small (3500 pairs in total) yet high-quality dataset.
- The second finetuning stage is trained on this dataset in a conversation template to significantly improve its generation reliability and overall usability. To our surprise, this stage is computationally efficient and takes only around 7 minutes with a single A100.
- MiniGPT-4 yields many emerging vision-language capabilities similar to those demonstrated in GPT-4.



