提升文学作品分析准确率的关键方法:
- Refinement Cue Words:明确指令如
"Extract characters, emotions, and relationships in order of appearance..."
要求严格按出现顺序提取 - 增加示例数量: in
examples
参数中添加更多带标注的文本片段 - 使用高质量模型:文学分析推荐
gemini-2.5-pro
mould - Post-processing validation:通过生成HTML可视化文件人工校验
visualization.html
典型实现代码:
# 定义提示词+示例
examples = [lx.data.ExampleData(
text="ROMEO. But soft! What light...",
extractions=[{"entity": "Romeo", "type": "character", "emotion": "hopeful"}]
)]
# 执行双重验证提取
result = lx.extract(text, prompt=prompt, examples=examples, model="gemini-2.5-pro", num_passes=2)
This answer comes from the articleLangExtract: open source tools to extract structured data from textThe