Key Performance Indicators
Key performance indicators for FlashMLA include:
- Memory bandwidth: up to 3000 GB/s on H800 GPUs (memory intensive configuration)
- computational power: Up to 580 TFLOPS (compute-intensive configuration)
Performance Test Methods
To test the performance of FlashMLA, you can follow the steps below:
- Edit the example script (e.g. example.py) to increase the input data size
- Use the time logging code:
import time
start = time.time()
o_i, lse_i = flash_mla_with_kvcache(...)
print(f"耗时: {time.time() - start} 秒") - Gradually increase the size of the data and observe performance changes
Test Notes
- Ensure that the test environment is stable and free of other heavy load tasks
- It is recommended to use a professional GPU monitoring tool to view actual bandwidth utilization
- Different configurations (sequence length, chunk size, etc.) may affect the final performance performance
This answer comes from the articleFlashMLA: Optimizing the MLA Decoding Kernel for Hopper GPUs (DeepSeek Open Source Week Day 1)The































