An automated modeling assessment workflow can be created through the following methods:
- Importing a dataset containing test questions
- Create separate response columns for each model to be tested, using the same prompt structure
- Add a rubric column with a prompt template of 'Evaluate {{prompt}} for response 1: {{model1}}, response 2: {{model2}}'
- A larger parametric model (e.g., 70B level) may be used as a criterion.
- The system automatically generates comparison results that include quality scores
- Save complete test configurations and results with 'Export to Hub' feature
This solution is especially suitable for R&D teams that need to evaluate new release models on a regular basis, saving more than 80% of manual evaluation time.
This answer comes from the articleAI Sheets: building and processing datasets using AI models in tables without codeThe