
Meta unveils Llama 3.2:A multimodal AI model for images and text processing
Meta has introduced Llama 3.2, its first open-source AI model which can process both images and text. This multimodal capability opens up possibilities for advanced AI applications such as augmented reality tools, visual search engines, and document analysis systems. Llama 3.2 also enhances real-time video understanding and content-based image sorting.
The model includes two vision models with 11 billion and 90 billion parameters and two lightweight text-only models with 1 billion and 3 billion parameters, designed for mobile and lower-power hardware. Llama 3.2 supports Meta's hardware projects, including the Ray-Ban Meta glasses, where vision capabilities are essential. The company plans to make Llama 3.2 functional on Arm-based mobile platforms, expanding AI deployment possibilities.
The previous version, Llama 3.1, remains significant for its powerful 405 billion parameter text generation capabilities.
