Metabase AI dataset generator is an open source tool that helps users quickly generate realistic-looking datasets for presentation, learning and data analysis. It generates data structures and rules based on OpenAI's GPT-4o model, combined with Faker to populate the data, and supports user-defined business types, data volumes and schemas. Users can preview data, export to CSV or SQL files, or explore data directly through Metabase. The tool uses Next.js and Tailwind CSS to build the interface, and Docker provides easy Metabase deployment for developers, data analysts, and enterprise users to quickly build demo data.
Function List
- Conversational Prompt Build: Users generate customized datasets by selecting the business type, data schema, and number of rows via drop-down menus.
- Real-time data preview: instantly view generated data samples in your browser.
- Data Export Function: Supports exporting datasets to CSV files (single or multi-table ZIP) or SQL insert statements.
- One-Click Metabase Launch: Quickly deploy Metabase via Docker and explore the data generated.
- Integration with OpenAI GPT-4o: Leverage AI to generate detailed data patterns and business rules.
- Multi-language interface support: Multi-language translation support is provided through the Crowdin project.
Using Help
Installation process
To use the Metabase AI dataset generator, users need to clone the GitHub repository and configure the environment. Below are the detailed steps:
- clone warehouse
Run the following command in the terminal to clone the project locally:git clone https://github.com/metabase/dataset-generator.git cd dataset-generator
- Configuring Environment Variables
Copy the sample environment file and add the OpenAI API key:cp .env.example .env.local
show (a ticket)
.env.local
file, where you fill in your OpenAI API key. The key can be downloaded from the OpenAI platform Get. An example of the contents of the file is shown below:OPENAI_API_KEY=your-api-key-here
- Installation of dependencies
Ensure that Node.js and Docker are installed. run the following command to install the JavaScript dependency:npm install
- Initiation of projects
Use the following command to start the development server:npm run dev
Then visit in your browser
http://localhost:3000
View the application interface. - Start Metabase (optional)
If you need to explore the data using Metabase, run the following command to start the Docker container:npm run metabase:start
After waiting for Metabase to start, click the "Open Metabase" button in the interface to access the Metabase dashboard. When you are done, run the following command to stop and clean up the Docker container:
npm run metabase:stop
Main function operation flow
1. Creation of data sets
- Go to the prompt build screen: Upon opening the application, the interface displays a conversational prompt builder. The user can select the type of business (e.g., retail, healthcare, financial, etc.), the data schema (e.g., single or multiple tables), and the number of rows (e.g., 100 or 1,000).
- Generate dataClick the "Preview Data" button and OpenAI GPT-4o will be called to generate the data schema and business rules and populate the specific data with Faker. The preview result is displayed in the browser with field names, data types and sample data.
- Adjustment parameters: If the preview results are not satisfactory, the user can return to the hint builder, adjust the parameters and regenerate.
2. Data export
- Export CSV: On the preview screen, click the "Export CSV" button and the system will generate a single CSV file (single table) or a ZIP file (multiple tables). The file contains the complete data set and is suitable for importing into other tools.
- Export SQL: Select the "Export SQL" option to generate a SQL insertion statement for direct database import.
- File saving: The exported file is automatically downloaded locally and the user can check the file content to ensure that the data meets the requirements.
3. Data exploration
- Starting Metabase: Click "Start Metabase" in the application interface, and Docker will automatically deploy the Metabase environment. After startup, click "Open Metabase" to enter the data analysis interface.
- data visualization: Metabase provides intuitive dashboard functionality that allows users to create charts, filter data, or build complex queries. No knowledge of SQL is required to operate, making it suitable for non-technical users.
- Stop Metabase: When the analysis is complete, click "Stop Metabase" to clean up the Docker container and free up system resources.
Featured Functions
- AI-driven data generation: The tool utilizes GPT-4o to generate complex data schemas, including field relationships, business rules, and event logic. For example, when generating retail data, AI automatically defines relationships between order, customer, and product tables, ensuring that data is authentic and consistent.
- Real-time preview: Users can view data samples without waiting and quickly verify that the generated results meet expectations.
- Seamless Metabase Integration: One-click startup of Metabase allows users to analyze data without additional configuration, making it ideal for quick presentations or teaching scenarios.
- Flexible exportSupport for CSV and SQL formats to meet the needs of different users, such as developers for database populations and analysts for Excel analysis.
caveat
- Ensure a stable network connection, which is required for OpenAI API calls and Docker deployments.
- Check that the OpenAI API key is valid, otherwise data generation will fail.
- Docker needs to be pre-installed and configured, otherwise Metabase will not start.
application scenario
- Teaching and training
Teachers or trainers can use the Dataset Builder to create customized datasets that simulate real business scenarios to help students learn data analysis and visualization. For example, generate retail data for SQL instruction. - Product Demo
Developers or organizations can quickly generate realistic-feeling datasets in product demos to show the capabilities of data analysis tools without having to prepare data manually. - Data Analysis Prototype
Data analysts can use the generated dataset to test analytical models early in the project, validate hypotheses, and save time collecting real data. - Software Development Testing
Developers can populate test databases with generated SQL data to simulate production environments and test application performance and functionality.
QA
- Do I need to pay to use it?
The tool is open source and free to use. However, it requires an OpenAI API key and may incur API call charges, depending on usage. - What business types are supported?
Multiple business types are supported, including retail, healthcare, finance, logistics, and more. Users can also customize other scenarios through the prompt builder. - How do you ensure that the data generated is authentic?
The schema generated by GPT-4o is based on real business rules, and the data populated by Faker follows these rules, ensuring that the data is logically consistent and close to reality. - What should I do if Metabase fails to start?
Check that Docker is installed and running correctly, and that your network connection is working. If the problem persists, check the terminal logs or file an issue in the GitHub repository. - Can it be used offline?
Data generation requires calls to the OpenAI API and must be connected to the Internet.Metabase and export functions can be run locally, but environment configuration must be completed in advance.