HumanOmni's Industry Leadership
Developed by the HumanMLLM team and open-sourced on GitHub, HumanOmni is currently the industry's first multimodal macromodel with human video analysis as its core task. The model innovatively integrates 2.4 million human-centric video clips and 14 million instruction data for pre-training, and uses 50,000 finely labeled video clips for fine-tuning.
Its core values are reflected in three areas:
- Complete analytical dimensions: Simultaneous coverage of facial expression, body movement and interaction scene recognition
- Dynamic integration mechanisms: the weights of the three branches of analysis can be automatically adjusted according to the inputs
- Open Source Properties: Full availability of code, pre-trained models and partial datasets
Compared to traditional unimodal models, HumanOmni achieves a UAR of 74.861 TP3T on the DFEW emotion recognition dataset, significantly ahead of GPT4-O's 50.571 TP3T.This breakthrough performance confirms its technological superiority as a domain-first model.
This answer comes from the articleHumanOmni: a multimodal macromodel for analyzing human video emotions and actionsThe































