Research & science · harness

BuildArena

Benchmark where LLM agents design, build, and test rockets, cars, and bridges in a physics simulator from text goals.

Connects to: Besiege · Python · Other 94★

Use it with an AI agent

Loadbay is an MCP server, so an agent can search the catalog and find this harness:

claude mcp add --transport http loadbay https://loadbay.xyz/api/mcp