Overseas access: www.kdjingpai.com
Bookmark Us
Current Position:fig. beginning " AI Answers

How do I use the VLM-R1 for finger representation comprehension tasks?

2025-09-05 1.8 K

Delegate understands the mission operating procedures

The VLM-R1 is particularly good at the Referential Expression Comprehension (REC) task. Below are the details of how to use it:

training phase

  1. Download the required datasets: including the COCO Train2014 image dataset and the RefCOCO annotation file
  2. Configure training parameters: modify the training script in the src/open-r1-multimodal directory
  3. Start training: use the multi-GPU training command, example: torchrun -nproc_per_node=8 ...

inference stage

  1. Go to the eval directory: cd src/eval
  2. Run the test script: python test_rec_r1.py -model_path ...
  3. Provide input: upload an image and enter a natural language question such as "Where is the blue car in the picture?"

Input/Output Example

  • importation: a picture containing multiple objects + a natural language query (e.g. "find the red cup in the bottom right corner of the picture")
  • exports: Bounding box coordinates or positional description of the target object

caveat

For custom data, you can modify the data_config/rec.yaml configuration file to add your own image paths and labeling files.

Recommended

Can't find AI tools? Try here!

Just type in the keyword Accessibility Bing SearchYou can quickly find all the AI tools on this site.

Top