Overseas access: www.kdjingpai.com
Bookmark Us
Current Position:fig. beginning " AI Answers

How can I improve the speed of the local big language model on a low configuration computer?

2025-08-23 649
Link directMobile View
qrcode

Three-step optimization program

For an average computer with 4-8GB of RAM, performance can be significantly improved by..:

  • Model Selection: Prefer quantized Q4_K_M level small models (less than 1GB), such as the gemma-3-1b-it recommended in the article, which reduces the volume by 75% compared to the original FP16 model but retains more than 90% effect.
  • system optimization::
    • Close other memory-hungry programs (e.g. browsers) and make sure you have at least 2GB of free memory
    • Right-click on the EXE file while the program is running → Properties → check "Run as administrator" (not required but can raise the priority of resources)
  • Tips for use::
    • Avoid frequent switching after the first model load to keep the model in memory
    • Putting model files on USB3.0 high-speed USB flash drive reduces 10% loading time
    • Complex tasks split into multiple short conversations (no more than 200 words for a single question)

After testing, the optimized generation speed can be increased from 8 tokens/sec to 18-22 tokens/sec to a usable level on an i5-8250U/8GB entry laptop. If it still doesn't meet the demand, try the more extreme Q2_K quantization model (with reduced accuracy but halved in size again).

Recommended

Can't find AI tools? Try here!

Just type in the keyword Accessibility Bing SearchYou can quickly find all the AI tools on this site.

Top