Skip to content

Agents

The agent system provides a layered hierarchy for building LLM-powered agents, from general-purpose conversational agents to specialized evaluation judges. All agents share a common streaming interface and integrate with guardrails, MLflow, and the latent CLI.

from latent.agents import (
    LiteLLMAgent, Judge, Classifier, RAGAgent,
    tool, Message,
    ScoredModel, OrdinalScore, BinaryScore, ContinuousScore,
)

BaseAgent

The abstract base class. All agents implement a single method:

async def stream(
    self,
    messages: list[Message],
    *,
    config: dict[str, Any] | None = None,
) -> AsyncIterator[AgentEvent]:
    ...

stream() yields a sequence of typed events (TextDelta, ToolCall, ToolResult, Usage, StepBoundary, Error, etc.) that represent the agent's reasoning and output. Guardrail middleware is auto-discovered from @guardrail-decorated methods and injected at construction time.


LiteLLMAgent

Concrete conversational agent powered by LiteLLM. Supports any model provider (OpenAI, Anthropic, Bedrock, Azure, etc.) and implements the ReAct pattern: the LLM can call tools, observe results, and iterate.

Construction

from latent.agents import LiteLLMAgent

agent = LiteLLMAgent(
    name="assistant",
    model="gpt-4o",
    system_prompt="You are a helpful assistant.",
    temperature=0.0,
    max_tokens=4096,
    max_iterations=10,       # ReAct loop cap
    response_format=None,    # Pydantic model for structured output
    tools=None,              # list of @tool-decorated functions
    optimized_prompt=None,   # path or OptimizedPrompt object
)

Parameters

Name Type Default Description
name str required Human-readable name for logging and MLflow
model str required LiteLLM model identifier (e.g. gpt-4o, anthropic/claude-sonnet-4-20250514)
system_prompt str \| None None System message prepended to every call
tools list[Callable] \| None None Functions decorated with @tool
temperature float 0.0 Sampling temperature
max_tokens int 4096 Max output tokens per LLM call
max_iterations int 10 Maximum ReAct loop iterations
response_format type[BaseModel] \| dict \| None None Structured output schema (strict mode enforced)
optimized_prompt str \| Path \| None None Load an optimized prompt from file

Usage patterns

from latent.agents import LiteLLMAgent, Message

agent = LiteLLMAgent(name="qa", model="gpt-4o")
response = agent.run([Message(role="user", content="What is 2+2?")])
print(response)  # "4"
agent = LiteLLMAgent(
    name="tutor",
    model="gpt-4o",
    system_prompt="You are a math tutor.",
)

print(agent.ask("What is a derivative?"))
print(agent.ask("Can you give me an example?"))  # retains history

agent.reset()  # clear chat history
from latent.agents import LiteLLMAgent, Message, TextDelta

agent = LiteLLMAgent(name="streamer", model="gpt-4o")

async for event in agent.stream([Message(role="user", content="Tell me a joke")]):
    if isinstance(event, TextDelta):
        print(event.text, end="", flush=True)
from pydantic import BaseModel
from latent.agents import LiteLLMAgent, Message

class City(BaseModel):
    name: str
    country: str
    population: int

agent = LiteLLMAgent(
    name="extractor",
    model="gpt-4o",
    response_format=City,
)

text = agent.run([Message(role="user", content="Tell me about Tokyo")])
city = City.model_validate_json(text)

Callable interface

agent(question) resets context and returns a response string. Designed for evaluation frameworks where each call should be stateless:

agent = LiteLLMAgent(name="eval_target", model="gpt-4o")
response = agent("What is the capital of France?")  # resets, asks, returns text

The @tool decorator

Marks a function as a tool for LiteLLM function calling. The function must have a docstring (used as the tool description) and type-annotated parameters (used to generate the JSON Schema).

from latent.agents import tool

@tool
def search_web(query: str) -> str:
    """Search the web for the given query and return results."""
    return do_search(query)

@tool
def get_weather(city: str, units: str = "celsius") -> str:
    """Get current weather for a city."""
    return fetch_weather(city, units)

Tools can be passed to LiteLLMAgent at construction:

agent = LiteLLMAgent(
    name="researcher",
    model="gpt-4o",
    tools=[search_web, get_weather],
)

@tool methods on agent subclasses

@tool-decorated methods on a LiteLLMAgent subclass are auto-discovered at construction -- no need to pass them via tools=:

from latent.agents import LiteLLMAgent, tool

class ResearchAgent(LiteLLMAgent):
    def __init__(self):
        super().__init__(name="researcher", model="gpt-4o")

    @tool
    def search_papers(self, query: str) -> str:
        """Search academic papers for the given query."""
        return self._do_search(query)

    @tool
    def summarize(self, text: str) -> str:
        """Summarize the given text."""
        return self._summarize(text)

Docstrings are required

A ValueError is raised if a @tool-decorated function has no docstring. The docstring becomes the tool description that the LLM sees.

Async tools

Both sync and async functions work as tools. Async tools are awaited automatically during the ReAct loop.


Judge and Classifier

Specialized agents for LLM-as-a-judge evaluation. Judge[T] scores a data row against a Pydantic output schema. Classifier[T] is a semantic alias with identical behavior, used when the output model has a prediction field.

Construction

from latent.agents import Judge, Classifier

judge = Judge(
    name="qa_judge",
    model="gpt-4o",
    output_type=MyScoreModel,       # Pydantic model (required)
    prompt_template=None,            # auto-generated from score annotations if omitted
    system_prompt=None,              # defaults to "You are an expert evaluator..."
    temperature=0.0,
    max_tokens=4096,
)

classifier = Classifier(
    name="intent_classifier",
    model="gpt-4o",
    output_type=IntentPrediction,
    prompt_template="Classify this message: {text}\nLabels: {labels}",
)

evaluate(row)

Formats the prompt_template with the row dict, calls the LLM with structured output, and parses the response:

score = judge.evaluate({"conversation": "...", "question": "..."})
# score is an instance of output_type (e.g. MyScoreModel)

Judge.__call__(row) delegates to evaluate(), making judges compatible with @task.map().


ScoredModel

A BaseModel subclass that auto-injects rationale fields. For every field annotated with BinaryScore, OrdinalScore, or ContinuousScore, a corresponding {field}_rationale: str field is created so the LLM can explain its reasoning.

from typing import Annotated
from latent.agents import ScoredModel, OrdinalScore, BinaryScore, ContinuousScore

class QAScores(ScoredModel):
    faithfulness: Annotated[int, OrdinalScore(
        scale=(1, 2, 3, 4, 5),
        pass_threshold=3,
        labels={1: "Hallucinated", 3: "Partial", 5: "Faithful"},
    )]
    relevance: Annotated[int, OrdinalScore(scale=(1, 2, 3, 4, 5))]
    factual: Annotated[int, BinaryScore(description="All claims are factually correct")]
    similarity: Annotated[float, ContinuousScore(min_value=0.0, max_value=1.0)]

This generates four additional fields automatically: faithfulness_rationale, relevance_rationale, factual_rationale, similarity_rationale -- all str with default "".

Score annotations

Annotation Type hint Parameters Use case
BinaryScore int description Pass/fail (0 or 1)
OrdinalScore int scale, labels, pass_threshold, description Likert scales (e.g. 1--5)
ContinuousScore float min_value, max_value, description Float ranges (e.g. 0.0--1.0)

Prompt generation

When prompt_template is omitted, Judge auto-generates a scoring rubric from the annotations using build_scoring_prompt():

from latent.agents import build_scoring_prompt

prompt = build_scoring_prompt(QAScores)
print(prompt)
Score on the following dimensions:

- faithfulness (1-5): faithfulness
  [1=Hallucinated, 3=Partial, 5=Faithful]
- relevance (1-5): relevance
- factual (0 or 1): All claims are factually correct
- similarity (0.0-1.0): similarity

For each score, provide a brief rationale explaining your reasoning:
- faithfulness_rationale: explain your faithfulness score
- relevance_rationale: explain your relevance score
- factual_rationale: explain your factual score
- similarity_rationale: explain your similarity score

Full example

from typing import Annotated
from latent.agents import Judge, ScoredModel, OrdinalScore

class ResponseQuality(ScoredModel):
    quality: Annotated[int, OrdinalScore(
        scale=(1, 2, 3, 4, 5),
        pass_threshold=3,
        description="Overall response quality",
    )]

judge = Judge("quality_judge", model="gpt-4o", output_type=ResponseQuality)
score = judge.evaluate({"conversation": "User: Hi\nAssistant: Hello!"})

print(score.quality)              # 4
print(score.quality_rationale)    # "The response is polite and appropriate..."

Pre-built judges

The latent.agents.judges module provides ready-to-use judges for common evaluation patterns. Each is a factory function that returns a configured Judge instance.

Judge Input columns Score type What it measures
FaithfulnessJudge context, response ContinuousScore(0-1) Proportion of claims supported by context
HallucinationJudge context, response ContinuousScore(0-1) Hallucination rate
RelevanceJudge query, response OrdinalScore(1-5) Response relevance to query
CompletenessJudge query, response OrdinalScore(1-5) Completeness of the answer
ConcisenessJudge response OrdinalScore(1-5) Brevity without losing information
ConsistencyJudge response_a, response_b ContinuousScore(0-1) Consistency between two responses
StyleJudge response OrdinalScore(1-5) Writing style and tone
InstructionFollowingJudge instruction, response OrdinalScore(1-5) How well instructions were followed
RecoveryJudge context, response OrdinalScore(1-5) Error recovery quality
GuardrailsJudge response BinaryScore Safety/policy compliance
from latent.agents.judges import FaithfulnessJudge, RelevanceJudge

faith_judge = FaithfulnessJudge(model="gpt-4o")
score = faith_judge.evaluate({
    "context": "The Eiffel Tower is 330 meters tall.",
    "response": "The Eiffel Tower is approximately 330 meters tall.",
})
print(f"Faithfulness: {score.faithfulness:.2f}")

Agent registry

The @agent decorator registers a BaseAgent subclass for CLI discovery (latent agents, latent chat):

from latent.agents import agent, LiteLLMAgent

@agent("support_bot")
class SupportBot(LiteLLMAgent):
    """Customer support agent with knowledge base access."""

    def __init__(self):
        super().__init__(
            name="support_bot",
            model="gpt-4o",
            system_prompt="You are a customer support agent.",
        )
latent agents          # lists registered agents
latent chat support_bot  # interactive REPL

RAGAgent

RAGAgent extends LiteLLMAgent with a config-driven RAG pipeline. It auto-registers a search_knowledge_base tool so the LLM can retrieve context during conversation.

from latent.agents import RAGAgent, Message

agent = RAGAgent(
    name="support",
    model="gpt-4o",
    rag_config={
        "embeddings": {"provider": "voyage", "model": "voyage-3"},
        "reranker": {"type": "cross_encoder"},
        "post_retrieval": {"type": "reorder"},
    },
)

agent.index_documents(
    texts=["Reset your password at settings > security...", ...],
    sources=["help/passwords.md", ...],
)

response = agent.run([Message(role="user", content="How do I reset my password?")])

Full documentation

See RAG for the complete RAG pipeline configuration reference, including chunking strategies, embedding providers, backend options, reranking, and caching.


Events

All agents emit a stream of typed events. Use these for logging, UI rendering, or metrics collection.

Event Fields When emitted
TextDelta text Each chunk of generated text
ToolCall tool_call_id, tool_name, arguments Before tool execution
ToolResult tool_call_id, output, error After tool execution
StepBoundary step Start of each ReAct iteration
LLMCallStart run_id, model_id, input_preview Before each LLM API call
LLMCallEnd run_id, model_id, input_tokens, output_tokens After each LLM API call
Usage input_tokens, output_tokens, model_id Token usage summary
Error message On failure (e.g. max iterations exceeded)
Message role, content Chat messages (used as input)
from latent.agents import LiteLLMAgent, Message, ToolCall, ToolResult, Usage

agent = LiteLLMAgent(name="demo", model="gpt-4o", tools=[my_tool])
response = agent.run([Message(role="user", content="Use the tool")])

for event in agent.events:
    if isinstance(event, ToolCall):
        print(f"Called {event.tool_name} with {event.arguments}")
    elif isinstance(event, Usage):
        print(f"Tokens: {event.input_tokens} in, {event.output_tokens} out")