Tabby utilizes a modular model architecture that allows users to freely replace the default StarCoder-1B and Qwen2-1.5B-Instruct models. This design allows developers to choose AI models that are better suited to their specific needs, such as using lighter models to save computational resources in embedded development or loading specialized training models for specialized domains such as scientific computing.
In terms of technical implementation, model switching is controlled by Docker startup parameters (-model/-chat-model), and supports mainstream open source models on the HuggingFace platform. Test data shows that with a hardware configuration of 16GB of RAM and 4GB of video memory, Tabby can stably run models of 1B parameter scale. For scenarios that require higher performance, users can adjust the concurrent processing capability with the -parallelism parameter.
This answer comes from the articleTabby: a native self-hosted AI programming assistant that integrates into VSCodeThe































