Autoregressive models that unify multimodal tasks, surpassing specialized models with visual path decoupling, autoregressive integration, and flexible design.



Autoregressive models that unify multimodal tasks, surpassing specialized models with visual path decoupling, autoregressive integration, and flexible design.







Pipecat is a framework for building voice (and multimodal) conversational agents. Things like personal coaches, meeting assistants, story-telling toys for kids, customer support bots, intake flows, and snarky social companions.




Amazon Nova is a new generation of foundation models with frontier intelligence and industry leading price performance. Generate text, code, and images with natural language prompts.





An Extensible Multi-person interactive Agent Framework Powered by LLM Code Generation; Support: QQ, Discord, Minecraft, Bilibili Live, SSE(SDK) ...




Marqo is more than a vector database, it's an end-to-end vector search engine. Vector generation, storage and retrieval are handled out of the box through a single API. No need to bring your own embeddings.




LLMII uses a local AI to label metadata and index images. It does not rely on a cloud service or database.




Gai is a beginner-friendly AI toolkit with no ads, no registration, and no other permissions required, except for Internet.



LLM Hub is an open-source Android app for on-device LLM chat and image generation. It's optimized for mobile usage (CPU/GPU/NPU acceleration) and supports multiple model formats so you can run powerful models locally and privately.




The main goal of llama.cpp is to enable LLM inference with minimal setup and state-of-the-art performance on a wide range of hardware - locally and in the cloud.




Fenn is a powerful, AI-driven desktop search engine for macOS that makes your files instantly searchable—including videos, audio, PDFs, Word documents, Excel sheets, and images. Just type or upload an image to find exactly where any object, person, or concept appears.

Cognigy.AI is the Conversational AI Platform focused on the needs of large enterprises to develop, deploy and run Conversational AIs on any conversational channel.




OmniSVG is the first family of end-to-end multimodal SVG generators that leverage pre-trained Vision-Language Models (VLMs), capable of generating complex and detailed SVGs, from simple icons to intricate anime characters.



Run and fine-tune generative AI models with easy-to-use APIs and highly scalable infrastructure. Train and deploy models at scale on our AI Acceleration Cloud and scalable GPU clusters. Optimize performance and cost.

Store, search, and query multi-modal data with fine-grained access control and built-in security. Build AI applications with confidence and speed.

DataChain builds a suite of tools for data preprocessing and management, experiment tracking, ML models versioning, and pipeline automation.
Open-source multimodal model with 7B active parameters for tasks like text-to-image, image editing, visual manipulation, multiview synthesis, and world navigation.

Anus (Autonomous Networked Utility System) is a powerful, flexible, and accessible open-source AI agent framework designed to revolutionize task automation. Built with modern AI technologies and best practices, Anus represents the next generation of AI agent frameworks, offering...


Reka.ai is a multimodal AI platform that builds advanced models from scratch, enabling agents that can see, hear, and reason across text, images, audio, and video—deployable anywhere from lightweight devices to enterprise systems.




The TEN Framework is an open-source framework that enables developers to quickly build real-time multimodal agents (voice, video, data stream, image and text), making it easy for developers to experiment, integrate large language models, and create reusable extensions.