Current Position:fig. beginning " AI Tool Library

ImBD: AI-generated content detection to detect whether content is generated by artificial intelligence

2025-01-13

928

ImBD (Imitate Before Detect) 是一个开创性的机器生成文本检测项目，该项目发表于AAAI 2025会议。随着ChatGPT等大语言模型(LLMs)的广泛应用，识别AI生成的文本内容变得越来越具有挑战性。ImBD项目提出了一种新颖的”先模仿后检测”方法，通过深入理解和模仿机器文本的风格特征来提升检测效果。该方法首次提出对齐机器文本的风格偏好，建立了一个全面的文本检测框架，能够有效识别经过人工修改的机器生成文本。项目采用Apache 2.0开源许可证，提供了完整的代码实现、预训练模型和详细文档，方便研究人员和开发者在此基础上进行进一步的研究和应用开发。

Demo address: https://ai-detector.fenz.ai/ai-detector

Function List

Supports high-precision detection of machine-generated text
Provide pre-trained models for immediate deployment and use
Novel textual style feature alignment algorithm implemented
Includes detailed experimental datasets and evaluation benchmarks
Provide complete training and inference code
Supports customized training data for model fine-tuning
Includes detailed API documentation and usage examples
Provides command line tools for quick testing and evaluation
Supports batch text processing
Includes visualization tools to display test results

Using Help

1. Environmental configuration

First you need to configure your Python environment and install the necessary dependencies:

git clone https://github.com/Jiaqi-Chen-00/ImBD
cd ImBD
pip install -r requirements.txt

2. Data preparation

Before starting to use ImBD, training and test data need to be prepared. The data should contain the following two categories:

Manually prepared original text
Machine-generated or machine-modified text

Data format requirements:

Text files need to be UTF-8 encoded
Each sample takes up one row
It is proposed to divide the dataset into training set, validation set and test set in the ratio of 8:1:1

3. Model training

Run the following command to start training:

python train.py \
--train_data path/to/train.txt \
--val_data path/to/val.txt \
--model_output_dir path/to/save/model \
--batch_size 32 \
--learning_rate 2e-5 \
--num_epochs 5

4. Model evaluation

Evaluate model performance using test sets:

python evaluate.py \
--model_path path/to/saved/model \
--test_data path/to/test.txt \
--output_file evaluation_results.txt

5. Text detection

Detection of individual texts:

python detect.py \
--model_path path/to/saved/model \
--input_text "要检测的文本内容" \
--output_format json

Batch detection of text:

python batch_detect.py \
--model_path path/to/saved/model \
--input_file input.txt \
--output_file results.json

6. Advanced functions

6.1 Model fine-tuning

If you need to optimize for domain-specific text, you can fine-tune the model using your own dataset:

python finetune.py \
--pretrained_model_path path/to/pretrained/model \
--train_data path/to/domain/data \
--output_dir path/to/finetuned/model

6.2 Visualization analysis

Use the built-in visualization tools to analyze the test results:

python visualize.py \
--results_file path/to/results.json \
--output_dir path/to/visualizations

6.3 API Service Deployment

Deploy the model as a REST API service:

python serve.py \
--model_path path/to/saved/model \
--host 0.0.0.0 \
--port 8000

7. Caveats

It is recommended to use GPUs for model training to improve efficiency
Training data quality has a significant impact on model performance
Regularly update the model to accommodate new AI-generated text features
Pay attention to model versioning when deploying in production environments
It is recommended to save the test results for subsequent analysis and model optimization

8. Frequently asked questions

Q: What languages does the model support?
A: Currently, we mainly support English, other languages need to be trained with corresponding datasets.

Q: How can I improve the accuracy of my tests?
A: Performance can be improved by adding training data, tuning model parameters, and fine-tuning using domain-specific data.

Q: How can detection speed be optimized?
A: Detection speed can be improved by batch processing, model quantization, and using GPU acceleration.

AI open source project AI educational tools

May not be reproduced without permission:Chief AI Sharing Circle " ImBD: AI-generated content detection to detect whether content is generated by artificial intelligence