Tifa-Deepsex-14b-CoT breaks through device limitations with innovative quantization solutions:
- 4bit packet quantization: Using the GPTQ-Auto algorithm, the full 128k context model can be loaded on an RTX3060 (12G RAM).
- CPU optimization: GGUF format optimized specifically for the llama.cpp framework, enabling the M2 Macbook to achieve inference speeds of 7 tokens per second
- Mobile Adaptation: The official APK client enables role-playing real-time response on Snapdragon 8 Gen2 chipset phones through dynamic uninstallation technology.
Measurements have shown that the Q4 version reduces the video memory requirement from 28GB (F16) to 6GB while maintaining the effects of the original 95% model, allowing creators to use top-notch AI writing capabilities without the need for specialized equipment.
This answer comes from the articleTifa-Deepsex-14b-CoT: a large model that specializes in roleplaying and ultra-long fiction generationThe































