Current Position:fig. beginning " AI Answers

What is Gaze-LLE and what are its main functions?

2025-09-10

2.1 K

Gaze-LLE is a gaze target prediction tool based on a large-scale learning encoder, developed by Fiona Ryan, Ajay Bati, and other researchers. The core goal of the tool is to efficiently predict the gaze target of a person in a video or image by means of a pre-trained vision base model (e.g. DINOv2).

Its main functions include:

Focus on target forecasting: Accurate prediction of gaze position using a pre-trained visual coder
Multi-gaze prediction: Multiple people in a single image can be processed simultaneously
Lightweight Architecture: learn lightweight decoders only on frozen pre-trained encoders
Multi-model support: Provide pre-trained models based on different backbone networks (ViT-B/ViT-L) and training data

The salient advantages of Gaze-LLE over comparable tools are a 1-2 order of magnitude reduction in parameter size and the absence of additional input modalities (e.g., depth or attitude information).

This answer comes from the articleGaze-LLE: A Target Prediction Tool for Character Gaze in VideoThe

May not be reproduced without permission:AI productivity tools " What is Gaze-LLE and what are its main functions?