llm-colosseum
Benchmarks LLMs by having them fight each other in real-time Street Fighter III, a head-to-head game-agent evaluation harness.
Connects to: DIAMBRA, OpenAI, Anthropic · Python · MIT 1,483★
Use it with an AI agent
Loadbay is an MCP server, so an agent can search the catalog and find this harness:
claude mcp add --transport http loadbay https://loadbay.xyz/api/mcp
- Source: https://github.com/OpenGenerativeAI/llm-colosseum
- This harness as JSON: /api/harnesses/llm-colosseum
- Agent setup: /setup.md
- Browse all 370+ harnesses on Loadbay