To evaluate a locally deployed LLM model via OpenBench, follow these steps:
- Deploy the required models (e.g., open source models such as llama3) locally using Ollama to ensure that the service starts properly
- Configure Ollama's API endpoints in the OpenBench runtime environment (default is http://localhost:11434)
- Execute the evaluation order:
bench eval mmlu --model ollama/模型名称:版本 --limit 50
- transferring entity
--temperature
parameter to adjust the randomness of the generated results using--max-tokens
Control output length - Once the assessment is complete, use the
bench view
Command to view interactive reports in a browser
The method is particularly suitable for scenarios that require offline evaluation or data sensitivity, and can comprehensively test the model's core capabilities such as reasoning and knowledge acquisition.
This answer comes from the articleOpenBench: an open source benchmarking tool for evaluating language modelsThe