
LM Studio 0.4 adds parallel model requests, server-native daemon and new stateful REST API
[LM Studio] 0.4 has been released, expanding its platform for discovering and running local open large language models such as gpt-oss, Llama, Gemma, Qwen, and DeepSeek. This update brings several significant backend and user experience enhancements.
Among the core improvements, requests sent to the same model can now be processed in parallel, rather than queued. This change enables higher throughput scenarios, both through the API and in chat sessions that use split view. As a result, workloads that require concurrent responses see immediate performance benefits.
In addition, LM Studio introduces llmster, a standalone server-native engine based on the desktop application's core. Llmster can run as an independent daemon, allowing deployment on Linux systems, cloud servers, GPU-equipped machines, or environments like Google Colab, independent of the graphical application.
Following these backend changes, the 0.4 release features a comprehensive user interface redesign, offering a more cohesive and visually improved experience. Users who prefer the command line gain a new CLI tool centered around the lms chat command, which allows direct, interactive chats with models and facilitates model downloads from the terminal.
Additionally, a new /v1/chat REST API endpoint is now available, enabling external apps to interface with local models and providing the ability to generate permission tokens, which help manage server client access.

Comments
Emmanuel hits the main fact, 0.4.0 was just awful, but 0.4.1 came really quick, and I haven`t seen any real bugs still. The new GUI is ... new, takes a little time to get used to. HATED it in the beginning, probably because of the bugs. Actually, some of the changes are an improvement for me.
This version seems very buggy and crashes very quickly. Fortunately, version 0.4.1 was released very quickly too. By the way, I prefer the previous GUI.