How to solve the problem of insufficient video memory when InternLM-XComposer processes 4K images?

2025-09-05

1.5 K

Optimized solution for high-resolution image processing

The following methods can be used when dealing with 4K images with insufficient video memory:

Hardware level:Prioritize the use of 24GB video cards such as RTX 3090/4090, or distribute the load through NVIDIA's multi-card parallelism technology
Parameter optimization:
1. Adjust the hd_num parameter (default 18), the smaller the value, the less video memory is occupied
2. Add flash-attention2 installation: pip install flash-attn -no-build-isolation
Pretreatment Program:
1. Chunking of images using OpenCV
2. Downsampling of 4K images to 2K (2048 x 1080) resolution
3. Enable -low-vram mode operation
Quantitative Programs:Load the 4-bit quantization model:
from transformers import BitsAndBytesConfig
quant_config = BitsAndBytesConfig(load_in_4bit=True)
model = AutoModel.from_pretrained(..., quantization_config=quant_config)

Experimental data shows that with 4-bit quantization enabled, the 7B model requires only 6GB of video memory to process 4K images