WeClone provides a complete data preprocessing process:
- Data exportYou need to use PyWxDump tool to decrypt the WeChat database first, and select CSV format to export specific contact or group chat records.
- Data preparation: Place the exported CSV folder (default path wxdump_tmp/export/csv) into the project directory under . /data/csv
- format conversion: Run the included csv_to_json.py script to convert the data to JSON training format
- Sensitive Information Filtering: The system will automatically filter cell phone number, ID number, etc. Users can add customized disable words in blocked_words.json
Note: 1) It is recommended to prepare at least 2,000 high-quality conversation data; 2) The group chat data needs to be manually filtered for valid conversations; 3) The example data format can be referred to data/example_chat.csv.
This answer comes from the articleWeClone: training digital doppelgangers with WeChat chats and voicesThe





























