Three-tier information filtering defense system
For sensitive scenarios such as contract/medical, it is recommended to configure the following protection measures:
protection level | Operating Methods | Description of effect |
---|---|---|
metadata desensitization | ingest_file(..., rules=[{"type":"metadata_extraction", "schema":{"patient_id":"redact"}}]) |
Automatic replacement of fields such as 18-digit ID card with *** |
content cleaning | Adding Rules
{"type":"natural_language", "prompt":"删除所有电话号码和邮箱"} |
Recognize and remove PII information based on NLP |
access control | existstart_server.py enable--auth-token parameters |
Forcing API calls to carry a JWT token |
Note: 1) Video processing requires an additional call to theenable_face_blur=True
2) The audit log needs to be used regularlyexport_audit_log()
Backup.
This answer comes from the articleMorphik Core: an open source RAG platform for processing multimodal dataThe