Sentiment Analysis Performance Report
HumanOmni demonstrates industry-leading performance in emotion recognition tasks:
Comparison of core indicators
- DFEW dataset: UAR index of 74.861 TP3T, significantly better than GPT4-O (50.571 TP3T)
- accuracy: Average accuracy of six categories of basic emotion recognition 72.3%
- responsiveness: 1080p video real-time processing up to 24fps (A100 graphics card)
Technical Advantages
The model uses a bimodal analysis mechanism:
- visual analysis: Captures micro-expression changes at 52 key facial points
- voice parsing: Analyzing intonation/speed of speech/pause characteristics via Mel spectra
- Integration of decision-making: Dynamic weighting of two types of signals using an attention mechanism
Test case
The model was successfully recognized in the educational scenario test:
- 91.21 TP3T's "confused" expression (combined with frowning + frequent blinking features)
- 88.71 TP3T "euphoric" state (determined by increased tone of voice + amplitude of body movements)
This performance is due to the 14,000 hours of labeled speech data and 800,000 expression-labeled images used by the model.
This answer comes from the articleHumanOmni: a multimodal macromodel for analyzing human video emotions and actionsThe































