HUD Documentation — Evaluations and RL Environments.

HUD gives you three things: access to every model through one API, a way to turn your code into agent-callable tools, and infrastructure to run evaluations and training at scale.

Install

# Install CLI
uv tool install hud-python --python 3.12

# Set your API key
hud set HUD_API_KEY=your-key-here

Get your API key at hud.ai/project/api-keys.

1. Models: Any Model, One API

Stop juggling API keys. Point any OpenAI-compatible client at inference.hud.ai and use Claude, GPT, Gemini, or Grok. Browse all available models at hud.ai/models.

from openai import AsyncOpenAI
import os

client = AsyncOpenAI(
    base_url="https://inference.hud.ai",
    api_key=os.environ["HUD_API_KEY"]
)

response = await client.chat.completions.create(
    model="claude-sonnet-4-5",  # or gpt-4o, gemini-2.5-pro, grok-4-1-fast...
    messages=[{"role": "user", "content": "Hello!"}]
)

Every call is traced. View them at hud.ai/home. → More on Models

2. Environments: Your Code, Agent-Ready

A production API is one live instance with shared state—you can’t run 1,000 parallel tests without them stepping on each other. Environments spin up fresh for every evaluation: isolated, deterministic, reproducible. Each generates training data. Turn your code into tools agents can call. Define scenarios that evaluate what agents do:

from hud import Environment

env = Environment("my-env")

@env.tool()
def search(query: str) -> str:
    """Search the knowledge base."""
    return db.search(query)

@env.scenario("find-answer")
async def find_answer(question: str):
    answer = yield f"Find the answer to: {question}"
    yield 1.0 if "correct" in answer.lower() else 0.0

Iterate locally with hud dev, then deploy to the platform:

hud init          # Scaffold environment
hud dev env:env   # Run as MCP server (Cursor/Claude Code can connect)
hud deploy        # Deploy to platform → run evals at scale

Once deployed, your environment is live—agents can run against it in parallel, all traced, all generating training data. → More on Environments · Hosted Running

3. Tasks & Training: Test and Train

Create tasks from your scenarios on hud.ai. Run evaluations across models. Train on successful completions. The same model string works before and after training—just better at your tasks. → More on Tasks & Training

Next Steps

Models

One endpoint for every model. Native tools.

Environments

Tools, scenarios, and iteration.

Hosted Running

Push to platform. Run at scale.

Tasks & Training

Evaluate and train models.

Community

GitHub

Star the repo and contribute

Discord

Join the community

Enterprise

Building agents at scale? We work with teams on custom environments, benchmarks, and training pipelines. 📅 Book a call · 📧 founders@hud.ai

Get Started

Essentials

Guides

Cookbooks

Advanced

Tools

SDK Reference

CLI Reference

Community

Introduction

Install

1. Models: Any Model, One API

2. Environments: Your Code, Agent-Ready

3. Tasks & Training: Test and Train

Next Steps

Models

Environments

Hosted Running

Tasks & Training

Community

GitHub

Discord

Enterprise

Get Started

Essentials

Guides

Cookbooks

Advanced

Tools

SDK Reference

CLI Reference

Community

​Install

​1. Models: Any Model, One API

​2. Environments: Your Code, Agent-Ready

​3. Tasks & Training: Test and Train

​Next Steps

Models

Environments

Hosted Running

Tasks & Training

​Community

GitHub

Discord

​Enterprise

Install

1. Models: Any Model, One API

2. Environments: Your Code, Agent-Ready

3. Tasks & Training: Test and Train

Next Steps

Community

Enterprise