The steps to debug LMCache performance issues are as follows:
- Checking log files: Monitoring
prefiller.log
,decoder.log
cap (a poem)proxy.log
, analyzing key metrics such as cache hit rate, storage back-end load, and more. - Running the test tool: Generate multiple rounds of Q&A or RAG workloads using the testing tools provided by LMCache, outputting CSV files to quantify latency and throughput.
- Environmental validation: Ensure CUDA, Python version compatibility, recommended to use Conda isolated environment.
- Community Support: Join the Slack channel or participate in the bi-weekly community meetings (Tuesdays at 9pm PT) for help.
For example, cloninglmcache-tests
After the repository, execute the following command to test the CPU backend performance:
python3 main.py tests/tests.py -f test_lmcache_local_cpu -o outputs/
The results will be saved as a CSV file for further analysis of optimization points.
This answer comes from the articleLMCache: A Key-Value Cache Optimization Tool for Accelerating Reasoning on Large Language ModelsThe