Three Ways to Improve Cross-Scenario Prediction Accuracy
Gaze-LLE already has a good generalization capability through pre-training strategy and model selection, if further improvement of cross-scene accuracy is needed:
- Model Selection:prioritize_inoutSuffixed models (e.g. gazelle_dinov2_vitb14_inout), which use GazeFollow + VideoAttentionTarget joint training data, covering a wide range of indoor and outdoor scenes
- Transfer Learning:Thaw the last 3 layers of backbone for fine-tuning, train 5-10 epochs on a small sample of data (~200 labeled maps) from the new scene
- Post-processing optimization:Perform non-maximum suppression (NMS) on the output heatmap and set a threshold to filter prediction points with confidence < 0.7
Note: The feature extractor of DINOv2 has already covered rich scene features during pre-training, and it is generally not recommended to completely re-train it. If the target scene has special lighting conditions (e.g. infrared surveillance), it is recommended to add histogram equalization in the data preprocessing stage.
This answer comes from the articleGaze-LLE: A Target Prediction Tool for Character Gaze in VideoThe































