domain

Gaming harnesses for AI agents

32 open-source Gaming harnesses an AI agent can use — MCP servers, SDKs, and adapters. Browse them on Loadbay. An agent can search these over Loadbay's MCP:

claude mcp add --transport http loadbay https://loadbay.xyz/api/mcp

→ Best Gaming harnesses (top picks, ranked)

generative_agents — Stanford 'Smallville' simulation of interactive LLM-driven agents that remember, reflect, and plan in a sandbox town.
ml-agents — Unity toolkit that turns games and simulations into environments for training agents via RL and imitation learning.
Stable-Baselines3 — The standard library of reliable PyTorch RL algorithm implementations (PPO, SAC, DQN) used to train game-playing agents.
Gymnasium — Standard API and reference environments for single-agent reinforcement learning, the maintained successor to OpenAI Gym.
unity-mcp — MCP bridge between AI assistants and the Unity Editor for managing assets, controlling scenes, and editing scripts.
pysc2 — DeepMind's StarCraft II learning environment exposing the game to RL agents through a Python observation and action API.
DeepMind Lab — Customisable 3D first-person platform for agent research, providing navigation, memory, and puzzle tasks for RL agents.
Voyager — LLM-powered open-ended embodied agent that autonomously explores, learns skills, and plays Minecraft via the Mineflayer bot API.
open_spiel — Collection of environments and algorithms for reinforcement learning and search in over 70 board, card, and strategy games.
Google Research Football — Physics-based 3D football environment for training RL agents in single- and multi-agent modes with a scenario academy.
PettingZoo — Standard API and environment suite for multi-agent reinforcement learning across classic, Atari, and board-game settings.
Minigrid — Lightweight, configurable gridworld environments (including BabyAI) for benchmarking exploration and instruction-following agents.
Arcade-Learning-Environment — The ALE platform that lets agents play hundreds of Atari 2600 games for reinforcement-learning research.
ViZDoom — Reinforcement-learning environment built on the 1993 game Doom for training agents from raw visual input.
Godot RL Agents — Bridge that turns Godot Engine games into RL environments for training NPC and character behaviors; the Godot analog to ML-Agents.
llm-colosseum — Benchmarks LLMs by having them fight each other in real-time Street Fighter III, a head-to-head game-agent evaluation harness.
diplomacy_cicero — Cicero, the agent that plays Diplomacy at a human level by combining strategic planning with open-domain dialogue.
TextWorld — Microsoft sandbox for training and evaluating agents on text-based games, generating procedural interactive fiction worlds.
procgen — Suite of procedurally generated game-like Gym environments for benchmarking generalization in RL agents.
Factorio Learning Environment — Open-ended environment for evaluating LLM agents in Factorio, testing long-horizon planning, program synthesis, and optimization.
nle — The NetHack Learning Environment, a fast procedurally generated roguelike sandbox for reinforcement-learning agents.
minerl — Minecraft-based reinforcement-learning environment and dataset for sample-efficient agent research.
GamingAgent (lmgame-Bench) — Framework of LLM/VLM gaming agents plus lmgame-Bench that evaluates models by having them actually play games like Sokoban and Mario.
Melting Pot — DeepMind suite of multi-agent RL scenarios for evaluating cooperation, competition, and social behavior across games.
crafter — Open-world survival-game benchmark that evaluates a broad spectrum of agent capabilities in a single procedurally generated environment.
PokerRL — Framework for multi-agent deep reinforcement learning research on no-limit and limit Texas Hold'em poker.
Craftax — JAX reimplementation of Crafter and NetHack as a fast open-ended benchmark for reinforcement-learning agents.
Odyssey — Gives Minecraft agents a library of open-world skills, so an LLM agent can explore, gather, and build across the game.
Stable-Retro — Maintained Farama fork of Gym Retro turning classic console games (Genesis, SNES, NES) into Gym environments for RL agents.
BALROG — Benchmark for agentic LLM/VLM reasoning across challenging games including NetHack, MiniHack, Crafter, BabyAI, and Baba Is You.
PokéLLMon — An LLM agent that reaches human-level play in Pokémon battles.
pokemon-agent — An agent that plays Pokémon with headless emulation, a REST API, and a live dashboard.

Browse all 370+ harnesses on Loadbay · this domain as JSON