AlignLab is an open source project developed by the OpenAlign team that provides a complete set of frameworks and tools dedicated to aligning large language models. By "alignment", we mean making the behavior and output of models more consistent with human expectations and values, such as ensuring that they are safe, truthful, unbiased, and harmless. As big models become more and more powerful, ensuring that they are used responsibly becomes a central challenge, and AlignLab aims to provide researchers and developers with a set of standardized, easy-to-use tools to address this problem. This project systematically improves the reliability of models by integrating multiple mainstream rubrics and datasets into a unified workflow that allows users to run complex security assessments and generate detailed analysis reports with simple commands.
Function List
- Harmonized assessment framework: Integrate adapters for several mainstream evaluation tools (e.g., lm-evaluation-harness, OpenAI Evals) so that users don't need to switch between tools.
- Extensive benchmarking suite: There are several built-in pre-set review suites such as
safety_core_v1
, covering a wide range of dimensions from safety, toxic content, authenticity to bias. - "Registry-first" design: All benchmarking is done through a simple
YAML
file to define it, containing information about data sources, rubrics, and versions, ensuring the reproducibility of the rubric. - Multi-language support: Integration of loaders for toxicity, realism and bias datasets in multiple languages facilitates cross-language model alignment studies.
- "Guard" model integration: Provides a unified interface to invoke "guard" models like Llama-Guard-3, either as a pre- or post-filter, or as a referee to evaluate the security of model outputs.
- Smart Body Review: Supports the evaluation of an intelligence's capabilities in a secure sandbox environment, such as assessing its attack success rate and propensity for excessive denial of service.
- Automated report generationThe report can generate a detailed evaluation report with one click, in a format similar to an academic paper, including charts, confidence intervals and categorical data analysis, and supports exporting to PDF or HTML format.
Using Help
AlignLab provides a set of command-line tools and core Python libraries that give users the flexibility to perform all aspects of model alignment.
1. Environmental installation
Project Recommended Use uv
as a package manager for faster dependency resolution.
Step 1: Install uv
If you don't already have it on your system uv
This can be accomplished by pipx
maybe pip
Perform the installation.
# 使用 pipx (推荐)
pipx install uv
# 或者使用 pip
pip install uv
Step 2: Create a virtual environment and activate it
In the project directory of your choice, use the uv
Create a new Python virtual environment.
# 创建名为 .venv 的虚拟环境
uv venv
# 激活虚拟环境 (Windows)
.venv\Scripts\activate
# 激活虚拟环境 (macOS/Linux)
source .venv/bin/activate
Step 3: Cloning the AlignLab repository
Clone the project code locally from GitHub.
git clone https://github.com/OpenAlign/AlignLab.git
cd AlignLab
Step 4: Install project dependencies
In the root directory of the repository, use the uv pip install
command installs all modules of AlignLab.-e
parameter indicates that the installation is in "editable" mode, which means that any changes you make to the source code will take effect immediately, making it ideal for development and debugging.
uv pip install -e packages/alignlab-core -e packages/alignlab-cli \
-e packages/alignlab-evals -e packages/alignlab-guards \
-e packages/alignlab-agents -e packages/alignlab-dash
Upon completion of the installationalignlab
The command line tool is ready to use.
2. Operation of core functions
AlignLab's main functionality is accomplished through alignlab
command line tool to invoke it, here are a few core commands to use.
A. Running a complete security assessment
This is one of the most commonly used features to run a comprehensive set of security core reviews on a given model and generate reports.
alignlab eval run --suite alignlab:safety_core_v1 \
--model meta-llama/Llama-3.1-8B-Instruct --provider hf \
--guards llama_guard_3 --max-samples 200 \
--report out/safety_core_v1
--suite alignlab:safety_core_v1
: Specifies that a file namedsafety_core_v1
's preset suite of reviews, it contains a series of tests on security, bias, and authenticity.--model meta-llama/Llama-3.1-8B-Instruct
: Specify the model to be evaluated, using the Llama-3.1 8B Instruction Fine Tuning Model as an example here.--provider hf
: Specify the model provider as Hugging Face (hf
).--guards llama_guard_3
:: Include the Llama Guard 3 model as a "guard" in the evaluation process to assess the model's security capabilities.--max-samples 200
: Set up to use up to 200 samples per test task for quick validation.--report out/safety_core_v1
: Specifies the path where the evaluation results are saved.
B. Generation of visualization reports
At the end of the review run, you can use the report build
command compiles the raw evaluation results data into a human-readable report.
alignlab report build out/safety_core_v1 --format html,pdf
out/safety_core_v1
: Points to the directory where the evaluation results were saved in the previous command.--format html,pdf
: Specifies that reports are generated in both HTML and PDF formats.
C. Viewing available resources
You can always see what available benchmarks and models have been registered in AlignLab.
# 列出所有可用的基准测试,并按安全、多语言进行筛选
alignlab benchmarks ls --filter safety,multilingual
# 列出所有可用的模型
alignlab models ls
D. Running individual benchmarks
In addition to running the full suite, you can also test only against a specific benchmark.
# 运行 truthfulqa 基准测试的验证集部分
# 并使用大语言模型作为裁判 (llm_rubric) 来进行打分
alignlab eval run truthfulqa --split validation --judge llm_rubric
application scenario
- AI Security and Compliance Research
Researchers can use AlignLab to conduct standardized security reviews of different big language models, systematically assessing the models' risks in terms of harmful information generation, bias, and privacy leakage through its comprehensive benchmarking suite. The quantitative reports generated can be used directly in academic papers and research analysis. - Pre-deployment validation of enterprise-class models
Before an organization integrates a big model into a product or service, it needs to ensure the security and reliability of its outputs, and AlignLab provides an out-of-the-box evaluation process for production environments that helps development teams conduct rigorous "red team walkthroughs" and risk assessments of their models before they go live to ensure that they comply with the company's security and ethics guidelines. ethical guidelines. - Alignment fine-tuning for domain-specific models
When a developer needs to fine-tune a model for a specific domain (e.g., finance, healthcare), it is important to not only enhance their expertise, but also to ensure that their behavior is in line with industry norms.AlignLab helps developers to continuously monitor the model's alignment level during the fine-tuning process, for example, by verifying that the output is true to form through TruthfulQA or ensuring that it is innocuous through customized benchmarks. - Fairness and Consistency Testing of Multilingual Models
For multilingual models that need to serve a global audience, it is critical to ensure that they perform consistently and without bias across cultures and languages, and AlignLab's multilingual support helps developers assess the fairness and cultural sensitivity of their models when dealing with different languages, and identify and fix potential bias issues in a timely manner.
QA
- What does "model alignment" mean?
Model alignment is the process of adjusting and optimizing a large language model so that its behavior and output are consistent with human intentions, values, and social norms. This consists of three main dimensions: usefulness (being able to understand and fulfill instructions), truthfulness (not telling lies), and harmlessness (not producing biased, discriminatory, or dangerous content.) This tool, AlignLab, is designed to systematically assess and improve the level of model alignment. - How is AlignLab different from other review tools?
The most important feature of AlignLab is its "comprehensive" and "framework" nature. Instead of reinventing evaluation algorithms, AlignLab integrates mature and widely recognized evaluation tools in the community (e.g. HarmBench, JailbreakBench, etc.) into a unified framework through the "adapter" mode. This eliminates the need for users to learn how to use multiple tools, allowing them to invoke different evaluation capabilities and produce standardized reports through a single set of commands, greatly simplifying the process of alignment work. - What hardware configuration is required to use AlignLab?
Hardware requirements depend largely on the size of the model you are reviewing. For a model like Llama-3.1 8B, you'll need a consumer or professional graphics card with at least 24GB of video memory. For larger models, you'll need more computing resources; AlignLab itself is a Python framework that doesn't consume a lot of resources, and the main overhead comes from loading and running large language models. - Can I add my own data sets or rubrics?
Yes. AlignLab is designed to be "registry-first" and adding a new benchmark is very simple. All you need to do is add a new benchmark to thebenchmarks/
directory to create aYAML
configuration file, in which you define the source of your dataset (e.g., Hugging Face Hub), the type of judging task, the judging method, and the relevant metadata is sufficient. This design makes it very easy to extend the framework.