Gaming · harness

llm-colosseum

Benchmarks LLMs by having them fight each other in real-time Street Fighter III, a head-to-head game-agent evaluation harness.

Connects to: DIAMBRA, OpenAI, Anthropic · Python · MIT 1,483★

Use it with an AI agent

Loadbay is an MCP server, so an agent can search the catalog and find this harness:

claude mcp add --transport http loadbay https://loadbay.xyz/api/mcp