Methods for Improving Inference Performance in MNN Mobile
To improve the inference performance of MNN on mobile, we can start from the following aspects:
- Quantification using models: Converts models to FP16 or Int8 format, reducing the model size of 50%-70% while significantly reducing memory footprint and computation
- Enable GPU acceleration: Select the appropriate backend based on the graphics APIs supported by the device (Metal/OpenCL/Vulkan)
- Optimize compilation options: Use MNN_BUILD_MINI compilation option to reduce the size of the framework by about 251 TP3T.
- Setting the batch size appropriately:: Balancing memory footprint and parallel computing gains
Practical approach:
1. Model quantization transformation commands:
. /MNNConvert -modelFile model.pb -MNNModel quant_model.mnn -fp16
2. C++ API to enable GPU acceleration:
MNN::ScheduleConfig config.
config.type = MNN_FORWARD_OPENCL; // select based on device
This answer comes from the articleMNN: A Lightweight and Efficient Deep Learning Inference FrameworkThe