Overseas access: www.kdjingpai.com
Ctrl + D Favorites

Jan-nano is a program based on the Qwen3 Architecture-optimized 4 billion parameter language model developed by Menlo Research and hosted on the Hugging Face platform. It is designed for efficient text generation, combining small size and long context processing capabilities for local or embedded environments. The model supports tool calls and research tasks, and performs well in SimpleQA benchmarks, making it suitable for users who need a lightweight AI solution. jan-nano is released as open source, with easy installation and rich community support for developers, researchers, and enterprise users.

Function List

  • Supports efficient text generation to produce smooth and accurate text content.
  • Provides powerful tool calls for seamless integration with external tools and APIs.
  • Optimized for long context handling, Jan-nano-128k version supports 128k tokens for native context windows.
  • Suitable for local deployment, low VRAM consumption, suitable for low-resource devices.
  • compatibility Model Context Protocol (MCP) servers to enhance the efficiency of research tasks.
  • Supports multiple quantization formats (e.g. GGUF) for easy deployment in different hardware environments.
  • Provide non-thinking chat templates to optimize the conversation generation experience.

Using Help

Installation process

Jan-nano models can be downloaded and deployed locally through the Hugging Face platform. Below are detailed installation and usage steps for beginners and developers:

  1. environmental preparation
    Ensure that Python 3.8+ and Git are installed on your system; a virtual environment is recommended to avoid dependency conflicts:

    python -m venv jan_env
    source jan_env/bin/activate  # Linux/Mac
    jan_env\Scripts\activate  # Windows
    
  2. Installation of the necessary tools
    Installation of Hugging Face transformers libraries and vllm(for efficient reasoning):

    pip install transformers vllm
    
  3. Download model
    utilization huggingface-cli Download the Jan-nano model:

    huggingface-cli download Menlo/Jan-nano --local-dir ./jan-nano
    

    If you need a quantized version of GGUF, you can download bartowski's quantized model:

    huggingface-cli download bartowski/Menlo_Jan-nano-GGUF --include "Menlo_Jan-nano-Q4_K_M.gguf" --local-dir ./jan-nano-gguf
    
  4. operational model
    utilization vllm To start the model service, the following command is recommended:

    vllm serve Menlo/Jan-nano --host 0.0.0.0 --port 1234 --enable-auto-tool-choice --tool-call-parser hermes
    

    For the Jan-nano-128k version, additional context parameters are required:

    vllm serve Menlo/Jan-nano-128k --host 0.0.0.0 --port 1234 --enable-auto-tool-choice --tool-call-parser hermes --rope-scaling '{"rope_type":"yarn","factor":3.2,"original_max_position_embeddings":40960}' --max-model-len 131072
    

    If you encounter problems with the chat template, you can manually download the non-thinking template:

    wget https://huggingface.co/Menlo/Jan-nano/raw/main/qwen3_nonthinking.jinja
    
  5. Verify Installation
    After starting the service, test the model via cURL or Python script:

    import requests
    response = requests.post("http://localhost:1234/v1/completions", json={
    "model": "Menlo/Jan-nano",
    "prompt": "你好,介绍一下 Jan-nano。",
    "max_tokens": 100
    })
    print(response.json()["choices"][0]["text"])
    

Main Functions

  • Text Generation
    Jan-nano specializes in generating natural language text. Users can enter prompts via the API or command line and the model will return smooth text. For example, type "write an article about AI" and the model will generate a clearly structured article. Recommended Parameters:temperature=0.7top-p=0.8top-k=20The
  • Tool Call
    Jan-nano supports automatic tool invocation, which is suitable for interacting with external APIs or databases. The user needs to specify the tool format in the prompt, and the model will parse it and call it. For example, a prompt word for weather:

    {
    "prompt": "查询北京今日天气",
    "tools": [{"type": "weather_api", "endpoint": "https://api.weather.com"}]
    }
    

    The model returns a structured response containing the results of the tool call.

  • Long context processing (Jan-nano-128k)
    Jan-nano-128k supports processing contexts up to 128k tokens long, which is suitable for analyzing long documents or multi-round conversations. Users can enter entire papers or long conversations and the model maintains contextual consistency. For example, analyzing a 50-page academic paper:

    curl -X POST http://localhost:1234/v1/completions -d '{"model": "Menlo/Jan-nano-128k", "prompt": "<论文全文>", "max_tokens": 500}'
    
  • Local Deployment Optimization
    The model consumes less VRAM, and the Q4_K_M quantization version is suitable for 8GB RAM devices. Users can adjust the quantization level (e.g. Q3_K_XL, Q4_K_L) to fit different hardware.

Featured Function Operation

  • MCP Server Integration
    Jan-nano is compatible with the Model Context Protocol (MCP) server for research scenarios. The user needs to start the MCP server and configure the model:

    mcp_server --model Menlo/Jan-nano --port 5678
    

    A research task request is then sent through the MCP client and the model automatically calls the relevant tool to complete the task.

  • SimpleQA Benchmarking
    Jan-nano performs well in SimpleQA benchmarks and is suitable for Q&A tasks. The user can enter a question and the model returns the exact answer. Example:

    curl -X POST http://localhost:1234/v1/completions -d '{"prompt": "Python 中的 lambda 函数是什么?", "max_tokens": 200}'
    

caveat

  • Ensure that your hardware meets the minimum requirements (8GB video memory recommended).
  • The Jan-nano-128k version is required for long context tasks.
  • Check the Hugging Face community discussions regularly for the latest optimization suggestions.

application scenario

  1. academic research
    Jan-nano-128k can process long papers or books, extract key information or generate summaries. Researchers can input entire documents, and the model can analyze context and answer complex questions, making it suitable for literature reviews or data analysis.
  2. Local AI Assistant
    In internet-free environments, Jan-nano can be used as a localized AI assistant to answer questions or generate text. Developers can integrate it into offline applications to provide intelligent customer service or writing assistance.
  3. Tool automation
    With tool call functionality, Jan-nano automates tasks such as querying databases, calling APIs or generating reports. Organizations can use it to automate workflows and improve efficiency.
  4. Embedded device deployment
    Due to the small size of the model, Jan-nano is suitable for embedded devices, such as smart homes or robots, providing real-time text generation and interaction.

QA

  1. What is the difference between Jan-nano and Jan-nano-128k?
    Jan-nano is the base version, suitable for short context tasks; Jan-nano-128k supports a native context window of 128k tokens, suitable for long document processing and complex research tasks.
  2. How do I choose the right quantized version?
    Q4_K_M is suitable for 8GB video memory devices with balanced performance and resource consumption; Q3_K_XL is lighter and suitable for low-end devices, but with slightly lower accuracy. Refer to the hardware configuration selection.
  3. Does the model support Chinese?
    Yes, based on Qwen3 architecture, Jan-nano has good support for Chinese language generation and understanding, which is suitable for Chinese language research and application scenarios.
  4. How to optimize long context performance?
    Using Jan-nano-128k, set up the rope-scaling parameter and ensure that the hardware supports large memory. Avoid frequent context switching to minimize performance overhead.
0Bookmarked
0kudos

Recommended

Can't find AI tools? Try here!

Just type in the keyword Accessibility Bing SearchYou can quickly find all the AI tools on this site.

inbox

Contact Us

Top

en_USEnglish