Gaming · harness

BALROG

Benchmark for agentic LLM/VLM reasoning across challenging games including NetHack, MiniHack, Crafter, BabyAI, and Baba Is You.

Connects to: NLE, Crafter, Gymnasium · Python · MIT 255★

Use it with an AI agent

Loadbay is an MCP server, so an agent can search the catalog and find this harness:

claude mcp add --transport http loadbay https://loadbay.xyz/api/mcp