Three-tier information filtering defense system
For sensitive scenarios such as contract/medical, it is recommended to configure the following protection measures:
| protection level | Operating Methods | Description of effect |
|---|---|---|
| metadata desensitization | ingest_file(..., rules=[{"type":"metadata_extraction", "schema":{"patient_id":"redact"}}]) |
Automatic replacement of fields such as 18-digit ID card with *** |
| content cleaning | Adding Rules
{"type":"natural_language",
"prompt":"删除所有电话号码和邮箱"}
|
Recognize and remove PII information based on NLP |
| access control | existstart_server.pyenable--auth-tokenparameters |
Forcing API calls to carry a JWT token |
Note: 1) Video processing requires an additional call to theenable_face_blur=True 2) The audit log needs to be used regularlyexport_audit_log()Backup.
This answer comes from the articleMorphik Core: an open source RAG platform for processing multimodal dataThe































