A typical process for generating synthetic data is as follows (using the example of creating a virtual customer email):
- Basic data generation::
- Enter the prompt "Generate resumes of 5 pharmaceutical company professionals" into the blank form.
- AI automatically creates tables with columns for name, position, expertise, etc.
- Extended Data Columns::
- Add a new column and enter the prompt "Write professional business emails based on {{person_bio}}".
- Select a creative model (e.g. GPT-OSS) to generate email content
- quality control::
- Check for email formatting compliance
- Drag the bottom of the column to generate more samples (up to 1000 rows)
- Batch Export::
- Exporting the dataset and config.yml file
- Scale to Tens of Thousands of Data with HF Jobs Run Scripts
The method is particularly suitable:
- Privacy-sensitive scenarios (avoid using real customer data)
- Model Training Data Expansion
- Business Process Simulation Test
This answer comes from the articleAI Sheets: building and processing datasets using AI models in tables without codeThe