Overseas access: www.kdjingpai.com
Ctrl + D Favorites

Hunyuan-A13B is an open source large language model developed by Tencent's hybrid team, based on the Mixed Expertise (MoE) architecture. The model has 8 billion parameters, of which 1.3 billion are active parameters, taking into account high performance and low computational costs.Hunyuan-A13B supports 256K ultra-long context processing, suitable for complex tasks such as long text analysis, code generation and intelligent agent operations. The model provides both fast and slow inference modes, allowing users to switch flexibly according to their needs. Tencent's Hybrid team open-sourced several versions of the model on GitHub and Hugging Face on June 27, 2025, including the pre-training model, the command fine-tuning model, and the optimized quantization model, which are convenient for developers to deploy in different hardware environments. Detailed technical reports and operation manuals are also available to help users get started quickly.

 

Function List

  • ultra-long context processing: Supports up to 256K context lengths for long documents, complex dialogs, and multi-round reasoning tasks.
  • bimodal inference: Provides fast reasoning and slow reasoning (chained reasoning, CoT) modes to meet the performance requirements of different scenarios.
  • Efficient MoE Architecture: 8 billion total parameters, 1.3 billion active parameters, reducing compute resource requirements and suitable for low-mount hardware.
  • Multiple quantitative support: FP8 and GPTQ-Int4 quantized versions are available to optimize inference efficiency and lower the deployment threshold.
  • Multi-disciplinary capacity: Performed well in math, science, code generation and intelligent agent tasks with excellent benchmark scores.
  • open source resource: Provides model weights, training code, technical reports, and operating manuals to support developer customization and extensions.

 

Using Help

Installation process

To use Hunyuan-A13B, you need to prepare a Python 3.10 and above environment, and a GPU (e.g. NVIDIA A100) is recommended for best performance. Below are the installation and deployment steps:

  1. clone warehouse
    Run the following command in a terminal to clone your GitHub repository:

    git clone https://github.com/Tencent-Hunyuan/Hunyuan-A13B.git
    cd Hunyuan-A13B
    
  2. Installation of dependencies
    Install the necessary Python libraries and make sure your environment supports PyTorch and Hugging Face's transformers library:

    pip install torch==2.5.1 transformers
    pip install -r requirements.txt
    
  3. Download model
    The Hunyuan-A13B model has been made available on the Hugging Face platform in several versions, including Hunyuan-A13B-Pretrain,Hunyuan-A13B-Instruct,Hunyuan-A13B-Instruct-FP8 cap (a poem) Hunyuan-A13B-Instruct-GPTQ-Int4. As an example, the download command for the command fine-tuning model is as follows:

    huggingface-cli download tencent/Hunyuan-A13B-Instruct
    
  4. Setting environment variables
    Configure the model path to an environment variable:

    export MODEL_PATH="tencent/Hunyuan-A13B-Instruct"
    
  5. Run the sample code
    Use the following Python code to load the model and perform inference:

    from transformers import AutoModelForCausalLM, AutoTokenizer
    import os
    import re
    model_name_or_path = os.environ['MODEL_PATH']
    tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, trust_remote_code=True)
    model = AutoModelForCausalLM.from_pretrained(model_name_or_path, device_map="auto", trust_remote_code=True)
    messages = [{"role": "user", "content": "写一篇关于定期锻炼好处的简短总结"}]
    tokenized_chat = tokenizer.apply_chat_template(messages, tokenize=True, return_tensors="pt", enable_thinking=True)
    outputs = model.generate(tokenized_chat.to(model.device), max_new_tokens=4096)
    output_text = tokenizer.decode(outputs[0])
    think_pattern = r'<think>(.*?)</think>'
    answer_pattern = r'<answer>(.*?)</answer>'
    think_matches = re.findall(think_pattern, output_text, re.DOTALL)
    answer_matches = re.findall(answer_pattern, output_text, re.DOTALL)
    think_content = think_matches[0].strip() if think_matches else ""
    answer_content = answer_matches[0].strip() if answer_matches else ""
    print(f"推理过程: {think_content}\n\n回答: {answer_content}")
    

Functional operation flow

1. Extra-long contextualization

The Hunyuan-A13B supports 256K context length, which is suitable for processing long documents or multi-round conversations. Users can set the max_seq_length=256000 to enable the ultra-long context model. For example, when analyzing a long technical document, the content of the document is fed directly into the model, which processes it completely and generates a summary or answer.

2. Bimodal reasoning

The model supports both fast and slow reasoning (chained reasoning, CoT). Fast reasoning is suitable for real-time dialogs, while slow reasoning is suitable for complex tasks such as mathematical reasoning or code debugging. The user can control the inference mode through parameters:

  • Enable Slow Reasoning: Settings enable_thinking=True or add before the prompt /thinkThe
  • Disabling Slow Reasoning: Settings enable_thinking=False or add before the prompt /no_thinkThe
    Example:
tokenized_chat = tokenizer.apply_chat_template(messages, enable_thinking=False)

3. Deployment of quantitative models

To reduce hardware requirements, Hunyuan-A13B provides FP8 and GPTQ-Int4 quantization versions. FP8 quantization converts model weights and activation values to 8-bit floating-point format through static calibration, which is suitable for low to mid-range GPUs. GPTQ-Int4 uses 4-bit integer quantization to further reduce memory footprint. Users can download the quantization model directly:

huggingface-cli download tencent/Hunyuan-A13B-Instruct-FP8

When deploying, make sure your hardware supports FP8 or INT4 operations, and we recommend using the TensorRT-LLM backend for faster inference.

4. Multi-disciplinary mandates

The Hunyuan-A13B excels in math, science, code generation, and intelligent agent tasks. For example, when dealing with math problems, the model automatically disassembles the problem and reasons about it step by step:

messages = [{"role": "user", "content": "求解方程 2x + 3 = 7"}]

The output will contain the reasoning process <think> and the final answer <answer>, ensuring that the results are clear and easy to understand.

5. Developer customization

Users can fine-tune the model based on the open source code. The official training manual is provided, which explains data preparation, training parameters and optimization strategies in detail. Fine-tuning example:

python train.py --model_path tencent/Hunyuan-A13B-Pretrain --data_path custom_dataset

caveat

  • Ensure that the GPU memory is sufficient (16GB or more recommended).
  • Check the version of the model on the Hugging Face platform and make sure you download the latest version.
  • Refer to the official technical reports for model performance on specific tasks.

 

application scenario

  1. academic research
    Researchers can utilize Hunyuan-A13B to process long academic papers, extract key information or generate reviews. The model's 256K context length enables complete analysis of multi-page documents, making it suitable for documentation and knowledge extraction.
  2. code development
    Developers can use the model to generate code, debug programs, or optimize algorithms. hunyuan-A13B excels in code generation tasks and supports multiple programming languages for rapid prototyping.
  3. Intelligent Agents
    The model can be used as the core of an intelligent agent to handle complex tasks such as automated customer service, data analytics or task scheduling. Its efficient MoE architecture ensures a low resource footprint for real-time applications.
  4. Educational aids
    Students and teachers can use the models to answer math and science questions or generate learning materials. The slow reasoning model provides detailed steps to solve problems to help users understand them.

 

QA

  1. What hardware does the Hunyuan-A13B fit?
    The model supports a wide range of hardware environments and recommends NVIDIA A100 or equivalent GPUs. the quantized version runs on lower-end GPUs (e.g., 10GB VRAM) and is suitable for personal developers.
  2. How to switch the reasoning mode?
    By setting the enable_thinking=True/False Or add to the prompt /think maybe /no_think Toggles fast or slow reasoning mode.
  3. What languages does the model support?
    Hunyuan-A13B is mainly optimized for Chinese and English tasks, but also performs well in multilingual benchmarks for multilingual scenarios.
  4. How do I get technical support?
    Questions can be submitted via GitHub or by contacting the official email address hunyuan_opensource@tencent.com Get support.
0Bookmarked
0kudos

Recommended

Can't find AI tools? Try here!

Just type in the keyword Accessibility Bing SearchYou can quickly find all the AI tools on this site.

inbox

Contact Us

Top

en_USEnglish