Overseas access: www.kdjingpai.com
Bookmark Us
Current Position:fig. beginning " AI Answers

Gaze-LLE is a large-scale learning encoder-based gaze target prediction tool

2025-09-10 2.1 K

Architectural Principles of Gaze-LLE

Gaze-LLE is a computer vision tool developed by a team at Georgia Tech whose core technical architecture is built on pre-trained visual base models. The tool innovatively employs frozen visual coders such as DINOv2 as the backbone network, requiring only the training of lightweight gaze decoder modules. This design allows the number of model parameters to be reduced by 1-2 orders of magnitude compared to traditional methods, and the typical parameter size to be compressed from hundreds of millions to millions.

The core breakthrough is reflected in two aspects: first, it completely relies on RGB image input, discarding the depth information or human gesture data required by traditional methods; second, it realizes efficient prediction through feature multiplexing, and a single image encoding can support the analysis of multiple gazes in a scene. This architecture makes Gaze-LLE significantly better than existing solutions in terms of computational efficiency and ease of deployment.

Recommended

Can't find AI tools? Try here!

Just type in the keyword Accessibility Bing SearchYou can quickly find all the AI tools on this site.

Top