Gaming harnesses for AI agents
32 open-source Gaming harnesses an AI agent can use — MCP servers, SDKs, and adapters. Browse them on Loadbay. An agent can search these over Loadbay's MCP:
claude mcp add --transport http loadbay https://loadbay.xyz/api/mcp
→ Best Gaming harnesses (top picks, ranked)
- generative_agents — Stanford 'Smallville' simulation of interactive LLM-driven agents that remember, reflect, and plan in a sandbox town.
- ml-agents — Unity toolkit that turns games and simulations into environments for training agents via RL and imitation learning.
- Stable-Baselines3 — The standard library of reliable PyTorch RL algorithm implementations (PPO, SAC, DQN) used to train game-playing agents.
- Gymnasium — Standard API and reference environments for single-agent reinforcement learning, the maintained successor to OpenAI Gym.
- unity-mcp — MCP bridge between AI assistants and the Unity Editor for managing assets, controlling scenes, and editing scripts.
- pysc2 — DeepMind's StarCraft II learning environment exposing the game to RL agents through a Python observation and action API.
- DeepMind Lab — Customisable 3D first-person platform for agent research, providing navigation, memory, and puzzle tasks for RL agents.
- Voyager — LLM-powered open-ended embodied agent that autonomously explores, learns skills, and plays Minecraft via the Mineflayer bot API.
- open_spiel — Collection of environments and algorithms for reinforcement learning and search in over 70 board, card, and strategy games.
- Google Research Football — Physics-based 3D football environment for training RL agents in single- and multi-agent modes with a scenario academy.
- PettingZoo — Standard API and environment suite for multi-agent reinforcement learning across classic, Atari, and board-game settings.
- Minigrid — Lightweight, configurable gridworld environments (including BabyAI) for benchmarking exploration and instruction-following agents.
- Arcade-Learning-Environment — The ALE platform that lets agents play hundreds of Atari 2600 games for reinforcement-learning research.
- ViZDoom — Reinforcement-learning environment built on the 1993 game Doom for training agents from raw visual input.
- Godot RL Agents — Bridge that turns Godot Engine games into RL environments for training NPC and character behaviors; the Godot analog to ML-Agents.
- llm-colosseum — Benchmarks LLMs by having them fight each other in real-time Street Fighter III, a head-to-head game-agent evaluation harness.
- diplomacy_cicero — Cicero, the agent that plays Diplomacy at a human level by combining strategic planning with open-domain dialogue.
- TextWorld — Microsoft sandbox for training and evaluating agents on text-based games, generating procedural interactive fiction worlds.
- procgen — Suite of procedurally generated game-like Gym environments for benchmarking generalization in RL agents.
- Factorio Learning Environment — Open-ended environment for evaluating LLM agents in Factorio, testing long-horizon planning, program synthesis, and optimization.
- nle — The NetHack Learning Environment, a fast procedurally generated roguelike sandbox for reinforcement-learning agents.
- minerl — Minecraft-based reinforcement-learning environment and dataset for sample-efficient agent research.
- GamingAgent (lmgame-Bench) — Framework of LLM/VLM gaming agents plus lmgame-Bench that evaluates models by having them actually play games like Sokoban and Mario.
- Melting Pot — DeepMind suite of multi-agent RL scenarios for evaluating cooperation, competition, and social behavior across games.
- crafter — Open-world survival-game benchmark that evaluates a broad spectrum of agent capabilities in a single procedurally generated environment.
- PokerRL — Framework for multi-agent deep reinforcement learning research on no-limit and limit Texas Hold'em poker.
- Craftax — JAX reimplementation of Crafter and NetHack as a fast open-ended benchmark for reinforcement-learning agents.
- Odyssey — Gives Minecraft agents a library of open-world skills, so an LLM agent can explore, gather, and build across the game.
- Stable-Retro — Maintained Farama fork of Gym Retro turning classic console games (Genesis, SNES, NES) into Gym environments for RL agents.
- BALROG — Benchmark for agentic LLM/VLM reasoning across challenging games including NetHack, MiniHack, Crafter, BabyAI, and Baba Is You.
- PokéLLMon — An LLM agent that reaches human-level play in Pokémon battles.
- pokemon-agent — An agent that plays Pokémon with headless emulation, a REST API, and a live dashboard.