Core assessment indicators
- knowledge hit rate: Proportion of knowledge base correctly called by the model (ideally >85%)
- Rejection accuracy: Ability to correctly reject questions that are outside the scope of the knowledge base
- Response accuracy: Decrease in factual error rate compared to the base model
Assessment methodology
- Use of official
evaluate.py
Scripted Test Preset Question Set - Constructing Adversarial Problems to Test Hallucinatory Suppression
- pass (a bill or inspection etc)
experiments/
The comparison script under reproduces the results of the thesis experiments
Performance Optimization Recommendations
Available when indicators are not ideal:Adjusting the intensity of knowledge embedding(-alpha parameter),Expanded training data(Synthetic data generated using Azure OpenAI),Optimization of the knowledge structure(Add labeling of inter-entity relationships). Note that the assessment should isolate the impact of the underlying model capabilities.
This answer comes from the articleKBLaM: An Open Source Enhanced Tool for Embedding External Knowledge in Large ModelsThe