Current Position:fig. beginning " AI Answers

WritingBench supports both automated scoring of large models and scoring of specialized rubric models.

2025-08-28

1.4 K

WritingBench provides a dual evaluation mechanism to ensure the reliability of the evaluation results. The first is an automatic scoring system based on a large model. Users can edit the evaluator/llm.py configuration file and access their own API endpoints to realize the scoring function. The second is a dedicated judging model scoring system, which is developed based on the Qwen-7B model, and users need to download the specific model from the HuggingFace platform before they can use it.

Both assessments use a standard 5-item scoring rubric with a 0-10 score range. The assessment script automatically outputs a score and specific rationale for each criterion, with detailed feedback such as 'Content completeness: 8/10, covers core elements but some details are lacking'.

This two-track design takes into account both the efficiency of the assessment and the quality of the scoring, allowing users the flexibility to choose the most suitable assessment method according to their actual needs.

This answer comes from the articleWritingBench: a benchmarking assessment tool to test the writing skills of large modelsThe

May not be reproduced without permission:AI productivity tools " WritingBench supports both automated scoring of large models and scoring of specialized rubric models.