The complete technology path for offline deployment on mobile
To realize a completely offline mobile application, the following technical solution is required:
- model transformation::
- utilization
transformers.onnxExport ONNX format (needs to be added)opset_version=13(Parameters) - Further optimization of computational graphs via TensorRT or MNN
- utilization
- application integration::
- TFLite inference is recommended for Android platform (32bit to 8bit weight quantization needs to be done)
- Core ML deployments are available for the iOS platform (note the addition of the
--quantize int8Options)
- performance balancing::
- Limit the length of generation (
max_length=50) Ensuring real-time - Enable caching mechanism to store FAQ pairs
- Limit the length of generation (
Tests show that the converted Bonsai model takes up only 180MB of storage space on iPhone 12, and a single inference takes <300ms. It is recommended to develop cross-platform applications with the React Native framework.
This answer comes from the articleBonsai: A three-valued weighted language model suitable for operation on edge devicesThe































