CosyVoice: Ali open source multilingual cloning and generation tools
CosyVoice is an open source multilingual speech generation model that focuses on high-quality text-to-speech (TTS) technology. It supports speech synthesis in multiple languages, providing features such as zero-sample speech generation, cross-language speech cloning, and fine-grained sentiment control.Cos- yVoice 2.0 compares to the previous version, significantly...
Qwen-TTS: Speech Synthesis Tool with Chinese Dialect and Bilingual Support
Qwen-TTS is a text-to-speech (TTS) tool developed by the Alibaba Cloud Qwen team and provided through the Qwen API. It is trained on a large-scale speech dataset, with a natural and expressive voice output that automatically adjusts intonation, speech rate, and emotion.Qwen-TTS supports Mandarin, English...
Kyutai: Speech to text real-time conversion tool
Kyutai Labs' delayed-streams-modeling project is an open source speech-to-text conversion framework based on Delayed Stream Modeling (DSM) technology at its core. It supports real-time speech-to-text (STT) and text-to-speech (TTS) functions , suitable for building efficient voice interaction applications . The project provides p...
DeepSeek-TNG-R1T2-Chimera: Enhanced version of DeepSeek released by TNG, Germany
DeepSeek-TNG-R1T2-Chimera is an open source large language model developed by TNG Technology Consulting GmbH and hosted on the Hugging Face platform. The model was released on July 2, 2025, and is d...
Index-AniSora: Bilibili open source anime video generation tool
Index-AniSora is an anime video generation model developed and open-sourced by Bilibili, hosted on GitHub. It uses CogVideoX-5B and Wan2.1-14B as the base model, and supports the generation of diverse anime style videos, including anime episodes, domestic original animation, manga adaptations, VTube...
GLM-4.1V-Thinking: an open source visual inference model to support multimodal complex tasks
GLM-4.1V-Thinking is an open source visual language model developed by the KEG Lab at Tsinghua University (THUDM), focusing on multimodal reasoning capabilities. Based on the GLM-4-9B-0414 base model, GLM-4.1V-Thinking utilizes reinforcement learning and "chain-of-mind" reasoning mechanisms to...
ERNIE 4.5
ERNIE 4.5 is an open source large model series developed by Baidu based on the PaddlePaddle framework, covering a wide range of models from 0.3B to 424B parameters, supporting text processing, image generation and multimodal tasks. The project is hosted on GitHub , combined with Hugging Face to provide models ...
Hunyuan-A13B: Efficient Open Source Large Language Modeling with Ultra-Long Context and Intelligent Reasoning Support
Hunyuan-A13B is an open source large language model developed by Tencent's hybrid team, based on the Mixed Expert (MoE) architecture design. The model has a total of 8 billion parameters, of which 1.3 billion are active parameters, taking into account high performance and low computational costs.Hunyuan-A13B supports 256K ultra-long context processing, suitable for...
Launch of FLUX.1 Kontext and BFL Playground
Today, we are proud to release FLUX.1 Kontext -- a set of generative flow matching models to support image generation and editing. Unlike existing text-based image generation models, the FLUX.1 Kontext family supports context-sensitive...
PartCrafter: Generating Editable 3D Part Models from a Single Image
PartCrafter is an innovative open source project focused on generating editable 3D part models from a single RGB image. It uses advanced structured 3D generation technology to generate multiple semantically meaningful 3D parts simultaneously from a single image , applicable to game development, product design and other fields. The project is based on pre-training...
Seedance 1.0
Seedance 1.0 is an AI video generation tool developed by the Seed team at ByteDance, focusing on converting text or images into high-quality video content. Users only need to enter a text description or upload an image, and Seedance can generate videos with a resolution of up to 1080p, which is suitable for creative content creation, .....
Gemma 3n
Google is expanding its footprint for inclusive AI with the release of Gemma 3 and Gemma 3 QAT, open source models that run on a single cloud or desktop gas pedal. If Gemma 3 brought powerful cloud and desktop capabilities to developers, this May 20, 2025 release...
MoviiGen 1.1
MoviiGen1.1 is an open source AI tool developed by ZuluVision that focuses on generating high quality videos from text. It supports 720P and 1080P resolutions and is especially suitable for professional video production that requires cinematic visual effects. Users can generate videos from simple text descriptions with natural dynamic...
HiDream-I1
HiDream-I1 is an open source image generation base model with 17 billion parameters to quickly generate high quality images. Users only need to enter a textual description, and the model can generate images in a variety of styles including realistic, cartoon, art, and more. Developed by the HiDream.ai team and hosted on GitHub, the project picks...
Imagen 4
Google DeepMind's recently launched Imagen 4 model, the latest iteration of its image generation technology, is quickly becoming an industry focal point. The model has made significant progress in improving the richness, accuracy of detail, and speed of image generation, working to bring the user's imagination to life in ways never before...
BAGEL
BAGEL is an open source multimodal base model developed by the ByteDance Seed team and hosted on GitHub.It integrates text comprehension, image generation, and editing capabilities to support cross-modal tasks. The model has 7B active parameters (14B parameters in total) and uses Mixture-of-Tra...
MiniMax Speech 02
With the continuous evolution of AI technologies, personalized and highly natural voice interaction has become a key requirement for many intelligent applications. However, existing text-to-speech (TTS) technologies still face challenges in meeting large-scale personalized tones, multilingual coverage, and highly realistic emotion expression. To address these line...
Windsurf SWE-1
SWE-1: A New Generation of Cutting-Edge Models for Software Engineering Recently, the much-anticipated SWE-1 family of models was released. Designed to optimize the entire software engineering process, this family of models goes far beyond the traditional task of writing code. Currently, the SWE-1 family consists of three well-positioned models:...
Qwen3 Released: A New Generation of Big Language Models for Thinking Deeply and Responding Fast
The field of large language models has a new member. Recently, the Qwen family of large language models has released its latest version, Qwen3. According to the development team, its flagship model, Qwen3-235B-A22B, has shown to be comparable to DeepSeek-R1 , o1 , o3 in benchmarks of coding, math, and general-purpose...