Optimized solution for high-resolution image processing
The following methods can be used when dealing with 4K images with insufficient video memory:
- Hardware level:Prioritize the use of 24GB video cards such as RTX 3090/4090, or distribute the load through NVIDIA's multi-card parallelism technology
- Parameter optimization:
1. Adjust the hd_num parameter (default 18), the smaller the value, the less video memory is occupied
2. Add flash-attention2 installation: pip install flash-attn -no-build-isolation - Pretreatment Program:
1. Chunking of images using OpenCV
2. Downsampling of 4K images to 2K (2048 x 1080) resolution
3. Enable -low-vram mode operation - Quantitative Programs:Load the 4-bit quantization model:
from transformers import BitsAndBytesConfig
quant_config = BitsAndBytesConfig(load_in_4bit=True)
model = AutoModel.from_pretrained(..., quantization_config=quant_config)
Experimental data shows that with 4-bit quantization enabled, the 7B model requires only 6GB of video memory to process 4K images
This answer comes from the articleInternLM-XComposer: a multimodal macromodel for outputting very long text and image-video comprehensionThe































