Overseas access: www.kdjingpai.com
Bookmark Us

Verifiers is a library of modular components for creating Reinforcement Learning (RL) environments and training Large Language Model (LLM) agents. The goal of this project is to provide a set of reliable tools that allow developers to easily build, train, and evaluate LLM agents. Verifiers contains a library based on the transformers Trainer implementation of the asynchronous GRPO (Generalized Reinforcement Learning with Policy Optimization) trainer, and got the prime-rl project support, which can be used for large-scale FSDP (Fully Sharded Data Parallel) training. In addition to reinforcement learning training, Verifiers can also be used directly to build LLM evaluations, create synthetic data pipelines, and implement agent control programs. The project aims to be a reliable toolkit that minimizes the "forked codebase proliferation" problem common in the reinforcement learning infrastructure ecosystem, and provides a stable development base for developers.

Function List

  • Modular environment components: provides a modular set of components for building reinforcement learning environments, making it easier to create and customize environments.
  • Multiple environment type support:
    • SingleTurnEnv: For tasks that require only a single response from the model per cue.
    • ToolEnv: Support for building agent loops utilizing the model's native tool or function call capabilities.
    • MultiTurnEnv:: Provides an interface for writing customized environmental interaction protocols for multi-round dialogs or interactive tasks.
  • built-in trainer: Contains a GRPOTrainerIt uses vLLM Inference, support for running via Accelerate/DeepSpeed GRPO Intensive learning training in style.
  • command-line tool:: Provides practical command-line tools to streamline workflow:
    • vf-init: Initialize a new environment module template.
    • vf-install: Install the environment module into the current project.
    • vf-eval: Rapidly assess environments using API models.
  • Integration & Compatibility: can be easily integrated into any reinforcement learning framework that supports an OpenAI-compatible inference client, and natively supports the use of the prime-rl working together for more efficient and larger scale training.
  • Flexible incentives:: Adoption Rubric Classes encapsulating one or more reward functions can define complex evaluation criteria for model-generated completions.

Using Help

The Verifiers library proposes to work with uv Package Manager together in your project.

1. Installation

First, you need to create a new virtual environment and activate it.

# 安装 uv
curl -LsSf https://astral.sh/uv/install.sh | sh
# 初始化一个新项目
uv init
# 激活虚拟环境
source .venv/bin/activate

Next, install Verifiers according to your needs:

  • Local Development and Evaluation (CPU): If you only use the API model for development and evaluation, installing the core library is sufficient.
    # 安装核心库
    uv add verifiers
    # 如果需要 Jupyter 和测试支持
    uv add 'verifiers[dev]'
    
  • GPU Training: If you plan to use vf.GRPOTrainer For model training on GPUs, you need to install the version with all the dependencies and additionally install the flash-attnThe
    uv add 'verifiers[all]' && uv pip install flash-attn --no-build-isolation
    
  • Use the latest development version: You can also download the main Branch mounting.
    uv add verifiers @ git+https://github.com/willccbb/verifiers.git
    
  • Installation from source (core library development): If you need to modify the Verifiers core library, you can install it from source.
    git clone https://github.com/willccbb/verifiers.git
    cd verifiers
    uv sync --all-extras && uv pip install flash-attn --no-build-isolation
    uv run pre-commit install
    

2. Creating and managing the environment

Verifiers treats each reinforcement learning environment as an installable Python module.

  • Initialize a new environment: Use vf-init command creates a new environment template.
    # 创建一个名为 my-new-env 的环境
    vf-init my-new-env
    

    This command will add a new command to the environments/my-new-env directory to generate a file containing the pyproject.toml and basic structure of the environment template.

  • installation environment: Once created, use the vf-install Install it into your Python environment so that it can be imported and used.
    # 安装本地环境
    vf-install my-new-env
    # 你也可以直接从 verifiers 官方仓库安装示例环境
    vf-install vf-math-python --from-repo
    

3. Environment of use

After installing the environment, you can use the vf.load_environment function loads it and evaluates or trains it.

  • Loading environment:
    import verifiers as vf
    # 加载已安装的环境,并传入必要的参数
    vf_env = vf.load_environment("my-new-env", **env_args)
    
  • Rapid assessment of the environment: Use vf-eval command to quickly test your environment. It defaults to using the gpt-4.1-mini model with 3 rollouts for each of the 5 cues.
    # 对名为 my-new-env 的环境进行评估
    vf-eval my-new-env
    

4. Core elements of the environment

A Verifiers environment consists of the following main components:

  • Datasets: A Hugging Face dataset must contain a prompt columns as input.
  • Rollout logic: The way in which the model interacts with the environment, for example, in the MultiTurnEnv defined in env_response cap (a poem) is_completed Methods.
  • Evaluation Criteria (Rubrics): Used to encapsulate one or more reward functions that score the output of the model.
  • Parsers: Optional component to encapsulate reusable parsing logic.

5. Training models

Verifiers offers two main types of training:

  • Using the built-in GRPOTrainer:
    This trainer is suitable for efficiently training dense on 2-16 GPUs. Transformer Model.

    # 步骤1: 启动 vLLM 推理服务器 (shell 0)
    # 假设使用7个GPU进行数据并行
    CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6 vf-vllm --model your-model-name \
    --data-parallel-size 7 --enforce-eager --disable-log-requests
    # 步骤2: 启动训练脚本 (shell 1)
    # 使用剩余的GPU进行训练
    CUDA_VISIBLE_DEVICES=7 accelerate launch --num-processes 1 \
    --config-file configs/zero3.yaml examples/grpo/train_script.py --size 1.7B
    
  • utilization prime-rl (Recommended):
    prime-rl is an external project that natively supports environments created with Verifiers and provides better performance and scalability through FSDP. It has a more mature configuration and user experience.

    # 在 prime-rl 的配置文件中指定环境
    # orch.toml
    [environment]
    id = "your-env-name"
    # 启动 prime-rl 训练
    uv run rl \
    --trainer @ configs/your_exp/train.toml \
    --orchestrator @ configs/your_exp/orch.toml \
    --inference @ configs/your_exp/infer.toml
    

application scenario

  1. Training task-specific intelligences
    utilization ToolEnv maybe MultiTurnEnvDevelopers can create complex interactive environments and train LLM intelligences to learn how to use external tools (e.g., calculators, search engines) or to accomplish specific tasks (e.g., booking airline tickets, customer support) in multi-round conversations.
  2. Building an automated assessment process
    SingleTurnEnv can be used to build automated assessment processes. By defining an automated assessment process that contains standardized answers and assessment criteria (Rubric) environment that allows quantitative comparisons of the performance of different models, e.g., evaluating the correctness of a code generation task or the quality of a text summary.
  3. Generate high quality synthetic data
    A large amount of data on model-environment interactions can be generated through the environment interaction (rollout) process. This data can be saved as Hugging Face datasets and used for subsequent supervised fine-tuning (SFT) or other model training, an efficient pipeline for synthetic data generation.
  4. Academic research and algorithm validation
    Verifiers provides a modular, reproducible experimentation platform for reinforcement learning researchers. Researchers can easily implement new interaction protocols, reward functions, or training algorithms and verify their effectiveness in a standardized environment.

QA

  1. What does the Verifiers library have to do with prime-rl?
    prime-rl is a standalone training framework that natively supports environments created using Verifiers. Verifiers specializes in providing components for building RL environments, while the prime-rl Instead, it focuses on providing a more powerful, better performing, and better scaling FSDP (Fully Segmented Data Parallelism) training solution. For large-scale training, the official recommendation is to use prime-rlThe
  2. How do I define a bonus function for my environment?
    You'll need to set up the vf.Rubric object defines one or more reward functions. Each function receives prompt,completion and other parameters and returns a floating point number as the reward value. You can also set different weights for different reward functions.
  3. Do I need to implement the interaction logic of the model myself?
    Not necessarily. For single-round quizzes and standard tool call scenarios, you can just use the SingleTurnEnv cap (a poem) ToolEnv. Inheritance is only needed if your application requires very unique, non-standard interaction flows MultiTurnEnv concurrently rewrite is_completed cap (a poem) env_response Methods.
  4. What should I do if I encounter NCCL-related errors during training?
    According to the official documentation, vLLM may experience inter-GPU communication hangs when synchronizing weights. You can try setting the NCCL_P2P_DISABLE=1 to fix the problem. If the problem persists, try setting the NCCL_CUMEM_ENABLE=1 or raise an issue with the project.
0Bookmarked
0kudos

Recommended

Can't find AI tools? Try here!

Just type in the keyword Accessibility Bing SearchYou can quickly find all the AI tools on this site.

Top

en_USEnglish