How to extend OpenBench to support new benchmarks?

2025-08-19

243

Based on the extensibility of the inspect-ai framework, the steps to add a new benchmark test are:

In the project directory of thebenchmarks/A new Python module is created under theBaseBenchmarkresemble
fulfillmentload_dataset()cap (a poem)evaluate()Methodology, defining the assessment logic
By means of a decorator@register_benchmarkRegistering for tests, setting up metadata (category, difficulty, etc.)
newly builtconftest.pyAdd dataset download logic (HuggingFace permissions need to be handled)
utilizationuv run pytest benchmarks/新测试名verification implementation
pass (a bill or inspection etc)bench listConfirm that the new test has appeared in the available list

It is recommended to refer to existing implementations of tests such as MMLU to keep the code style uniform.

Quick query station AI tool