MoE Architecture Features and Open Source Significance of dots.llm1
The core innovation of dots.llm1, as the first open source large language model of Xiaohongshu, is that it is designed with a Mixed Expert (MoE) architecture. The architecture contains 128 routing experts and 2 shared experts, and 6 routing experts and 2 shared experts are dynamically selected to process together per input token. This design allows the model to activate only 14 billion parameters during inference while maintaining a total parameter size of 142 billion, reducing the computational cost by more than 80%.
- Architecture details: using unidirectional decoder Transformer structure, optimized data capture using SwiGLU activation function
- Core technology: attention layer combines multi-head attention mechanism with RMSNorm normalization to improve numerical stability
- Load balancing: Optimize expert network usage with dynamic bias terms to avoid expert load imbalance
The open source strategy makes dots.llm1 the first commercially available MoE model from a Chinese social platform, filling the open source gap of Chinese MoE macromodels.
This answer comes from the articledots.llm1: the first MoE large language model open-sourced by Little Red BookThe