Grok-2: xAI's Open Source Hybrid Expert Large Language Model
Grok-2 is a second-generation macrolanguage model developed by Elon Musk's xAI in 2024. A key feature of the model is its Mixture-of-Experts (MoE) architecture, which is designed to process information more efficiently. Simply put, there are multiple "experts" within the model...
Seed-OSS: Open Source Large Language Model for Long Context Reasoning and Versatile Applications
Seed-OSS is a series of open source large language models developed by the Seed team at ByteDance, focusing on long context processing, reasoning capabilities and agent task optimization. The models contain 36 billion parameters, are trained with only 12 trillion tokens, have excellent performance in multiple mainstream benchmarks, and support ......
DeepSeek-V3.1-Base: a large-scale language model for efficiently processing complex tasks
DeepSeek-V3.1-Base is an open source large language model developed by DeepSeek and released on the Hugging Face platform, designed for natural language processing tasks. It has 685 billion parameters, supports multiple data types (BF16, F8_E4M3, F32), and can...
GPT-OSS: OpenAI's Open Source Big Model for Efficient Reasoning
GPT-OSS is a family of open source language models from OpenAI, including gpt-oss-120b and gpt-oss-20b, with 117 billion and 210 billion parameters, respectively, licensed under the Apache 2.0 license, which allows developers to download, modify, and deploy them free of charge. gpt-oss...
GLM-4.5: Open Source Multimodal Large Model Supporting Intelligent Reasoning and Code Generation
GLM-4.5 is an open source multimodal grand language model developed by zai-org, designed for intelligent reasoning, code generation and intelligent body tasks. It contains GLM-4.5 (355 billion parameters, 32 billion active parameters), GLM-4.5-Air (106 billion parameters, 12 billion active parameters), and several other...
Qwen3-235B-A22B-Thinking-2507: A large-scale language model to support complex reasoning
Qwen3-235B-A22B-Thinking-2507 is a large-scale language model developed by the Alibaba Cloud Qwen team, released on July 25, 2025 and hosted on the Hugging Face platform. It specializes in complex reasoning tasks and supports up to 256K (262,144) tokens...
dots.llm1: the first MoE large language model open-sourced by Little Red Book
rednote-hilab/dots.llm1.base is the first big language model dots.llm1 open-sourced by Little Red Book, hosted on the Hugging Face platform. The model adopts the Mixed Expert (MoE) architecture with 142 billion parameters, and only 14 billion parameters are activated during inference, balancing high performance and low cost. d...
Jan-nano: a lightweight and efficient model for text generation
Jan-nano is a 4 billion parameter language model optimized for the Qwen3 architecture, developed by Menlo Research and hosted on the Hugging Face platform. It is designed for efficient text generation, combining small size and long context processing capabilities for local or embedded environments. The model supports...
NextCoder-32B: An Open Source Big Model for Code Editing and Optimization Support
NextCoder-32B is an open source editorial big model developed by Microsoft and released on the Hugging Face platform. It is based on the Qwen2.5 model, optimized by Selective Knowledge Transfer (SeleKT) technology, and is designed for code generation,...
DeepSeek-TNG-R1T2-Chimera: Enhanced version of DeepSeek released by TNG, Germany
DeepSeek-TNG-R1T2-Chimera is an open source large language model developed by TNG Technology Consulting GmbH and hosted on the Hugging Face platform. The model was released on July 2, 2025, and is d...
ERNIE 4.5
ERNIE 4.5 is an open source large model series developed by Baidu based on the PaddlePaddle framework, covering a wide range of models from 0.3B to 424B parameters, supporting text processing, image generation and multimodal tasks. The project is hosted on GitHub , combined with Hugging Face to provide models ...
Hunyuan-A13B: Efficient Open Source Large Language Modeling with Ultra-Long Context and Intelligent Reasoning Support
Hunyuan-A13B is an open source large language model developed by Tencent's hybrid team, based on the Mixed Expert (MoE) architecture design. The model has a total of 8 billion parameters, of which 1.3 billion are active parameters, taking into account high performance and low computational costs.Hunyuan-A13B supports 256K ultra-long context processing, suitable for...
Qwen3 Released: A New Generation of Big Language Models for Thinking Deeply and Responding Fast
The field of large language models has a new member. Recently, the Qwen family of large language models has released its latest version, Qwen3. According to the development team, its flagship model, Qwen3-235B-A22B, has shown to be comparable to DeepSeek-R1 , o1 , o3 in benchmarks of coding, math, and general-purpose...
Top