Based on the extensibility of the inspect-ai framework, the steps to add a new benchmark test are:
- In the project directory of the
benchmarks/
A new Python module is created under theBaseBenchmark
resemble - fulfillment
load_dataset()
cap (a poem)evaluate()
Methodology, defining the assessment logic - By means of a decorator
@register_benchmark
Registering for tests, setting up metadata (category, difficulty, etc.) - newly built
conftest.py
Add dataset download logic (HuggingFace permissions need to be handled) - utilization
uv run pytest benchmarks/新测试名
verification implementation - pass (a bill or inspection etc)
bench list
Confirm that the new test has appeared in the available list
It is recommended to refer to existing implementations of tests such as MMLU to keep the code style uniform.
This answer comes from the articleOpenBench: an open source benchmarking tool for evaluating language modelsThe