For the visual-verbal hybrid model, LMCache implements two innovative designs:
- cross-modal hash mechanism: Unique hashes (mm_hashes) are generated for image tokens to establish a mapping relationship with the key-value cache of text tokens to ensure that visual features can be reused precisely. For example, in the image description generation task, the visual features of the same image need to be computed only once.
- Hybrid Storage Strategy: According to the size characteristics of the image features, automatically select the storage medium - high-frequency small features stored GPU memory, low-frequency large features stored CPU or disk, typical scenarios can be reduced to 40% of video memory occupation.
This feature needs to be used in conjunction with the multimodal version of vLLM, refer to the official configuration for details.LMCache-ExamplesExample of Visual Question and Answer (VQA) in a warehouse.
This answer comes from the articleLMCache: A Key-Value Cache Optimization Tool for Accelerating Reasoning on Large Language ModelsThe































