The LLM Mafia Game Competition is a platform developed by the OpenNumbers team specifically for testing the performance of artificial intelligence language models (LLMs) in complex social reasoning scenarios. The platform allows multiple large models to play different roles in real-time battles through the classic werewolf killing game format, fully demonstrating the model's logical reasoning and language generation capabilities.
The platform has three main core functions to evaluate model performance:
- Real-Time Matchmaking System Demonstrates Reasoning Process of Models in Games
- Detailed modeling statistics including win rate and reasoning performance
- A complete history of battles is available for analysis and research
This type of evaluation has a significant advantage over traditional AI testing methods in that it not only evaluates the model's individual capabilities, but also comprehensively examines the model's performance in complex human-like social interactions.
This answer comes from the articleWatch multiple large models compete in a game of Werewolf Reasoning to test who has the best reasoning skills!The





























