How to use MCPMark for model evaluation? What are the specific steps?

2025-08-28

387

MCPMark Assessment Process Explained

Model evaluation using MCPMark typically involves four key steps:

Complete the tool installation and environment configuration according to the previous description

Configure API access for services to be tested (GitHub/Notion etc.)

Full volume testing:python -m pipeline --exp-name 实验名 --mcp 环境 --tasks all --models 模型名 --k 尝试次数
Group testing: Specific task groups such as online_resume can be specified.

The raw results are saved in the./results/catalogs
Use the aggregation command to generate reports:python -m src.aggregators.aggregate_results --exp-name 实验名

Detailed reports in JSON and CSV formats are generated for each experiment, supporting multi-dimensional analysis of multiple metrics.