Agents¶

The agent system provides a layered hierarchy for building LLM-powered agents, from general-purpose conversational agents to specialized evaluation judges. All agents share a common streaming interface and integrate with guardrails, MLflow, and the latent CLI.

from latent.agents import (
    LiteLLMAgent, Judge, Classifier, RAGAgent,
    tool, Message,
    ScoredModel, OrdinalScore, BinaryScore, ContinuousScore,
)

BaseAgent¶

The abstract base class. All agents implement a single method:

async def stream(
    self,
    messages: list[Message],
    *,
    config: dict[str, Any] | None = None,
) -> AsyncIterator[AgentEvent]:
    ...

stream() yields a sequence of typed events (TextDelta, ToolCall, ToolResult, Usage, StepBoundary, Error, etc.) that represent the agent's reasoning and output. Guardrail middleware is auto-discovered from @guardrail-decorated methods and injected at construction time.

LiteLLMAgent¶

Concrete conversational agent powered by LiteLLM. Supports any model provider (OpenAI, Anthropic, Bedrock, Azure, etc.) and implements the ReAct pattern: the LLM can call tools, observe results, and iterate.

Construction¶

from latent.agents import LiteLLMAgent

agent = LiteLLMAgent(
    name="assistant",
    model="gpt-4o",
    system_prompt="You are a helpful assistant.",
    temperature=0.0,
    max_tokens=4096,
    max_iterations=10,       # ReAct loop cap
    response_format=None,    # Pydantic model for structured output
    tools=None,              # list of @tool-decorated functions
    optimized_prompt=None,   # path or OptimizedPrompt object
)

Parameters

Name	Type	Default	Description
`name`	`str`	required	Human-readable name for logging and MLflow
`model`	`str`	required	LiteLLM model identifier (e.g. `gpt-4o`, `anthropic/claude-sonnet-4-20250514`)
`system_prompt`	`str \\| None`	`None`	System message prepended to every call
`tools`	`list[Callable] \\| None`	`None`	Functions decorated with `@tool`
`temperature`	`float`	`0.0`	Sampling temperature
`max_tokens`	`int`	`4096`	Max output tokens per LLM call
`max_iterations`	`int`	`10`	Maximum ReAct loop iterations
`response_format`	`type[BaseModel] \\| dict \\| None`	`None`	Structured output schema (strict mode enforced)
`optimized_prompt`	`str \\| Path \\| None`	`None`	Load an optimized prompt from file

Usage patterns¶

Single-shotMulti-turn chatAsync streamingStructured output

from latent.agents import LiteLLMAgent, Message

agent = LiteLLMAgent(name="qa", model="gpt-4o")
response = agent.run([Message(role="user", content="What is 2+2?")])
print(response)  # "4"

agent = LiteLLMAgent(
    name="tutor",
    model="gpt-4o",
    system_prompt="You are a math tutor.",
)

print(agent.ask("What is a derivative?"))
print(agent.ask("Can you give me an example?"))  # retains history

agent.reset()  # clear chat history

from latent.agents import LiteLLMAgent, Message, TextDelta

agent = LiteLLMAgent(name="streamer", model="gpt-4o")

async for event in agent.stream([Message(role="user", content="Tell me a joke")]):
    if isinstance(event, TextDelta):
        print(event.text, end="", flush=True)

from pydantic import BaseModel
from latent.agents import LiteLLMAgent, Message

class City(BaseModel):
    name: str
    country: str
    population: int

agent = LiteLLMAgent(
    name="extractor",
    model="gpt-4o",
    response_format=City,
)

text = agent.run([Message(role="user", content="Tell me about Tokyo")])
city = City.model_validate_json(text)

Callable interface¶

agent(question) resets context and returns a response string. Designed for evaluation frameworks where each call should be stateless:

agent = LiteLLMAgent(name="eval_target", model="gpt-4o")
response = agent("What is the capital of France?")  # resets, asks, returns text

The `@tool` decorator¶

Marks a function as a tool for LiteLLM function calling. The function must have a docstring (used as the tool description) and type-annotated parameters (used to generate the JSON Schema).

from latent.agents import tool

@tool
def search_web(query: str) -> str:
    """Search the web for the given query and return results."""
    return do_search(query)

@tool
def get_weather(city: str, units: str = "celsius") -> str:
    """Get current weather for a city."""
    return fetch_weather(city, units)

Tools can be passed to LiteLLMAgent at construction:

agent = LiteLLMAgent(
    name="researcher",
    model="gpt-4o",
    tools=[search_web, get_weather],
)

`@tool` methods on agent subclasses¶

@tool-decorated methods on a LiteLLMAgent subclass are auto-discovered at construction -- no need to pass them via tools=:

from latent.agents import LiteLLMAgent, tool

class ResearchAgent(LiteLLMAgent):
    def __init__(self):
        super().__init__(name="researcher", model="gpt-4o")

    @tool
    def search_papers(self, query: str) -> str:
        """Search academic papers for the given query."""
        return self._do_search(query)

    @tool
    def summarize(self, text: str) -> str:
        """Summarize the given text."""
        return self._summarize(text)

Docstrings are required

A ValueError is raised if a @tool-decorated function has no docstring. The docstring becomes the tool description that the LLM sees.

Async tools

Both sync and async functions work as tools. Async tools are awaited automatically during the ReAct loop.

Judge and Classifier¶

Specialized agents for LLM-as-a-judge evaluation. Judge[T] scores a data row against a Pydantic output schema. Classifier[T] is a semantic alias with identical behavior, used when the output model has a prediction field.

Construction¶

from latent.agents import Judge, Classifier

judge = Judge(
    name="qa_judge",
    model="gpt-4o",
    output_type=MyScoreModel,       # Pydantic model (required)
    prompt_template=None,            # auto-generated from score annotations if omitted
    system_prompt=None,              # defaults to "You are an expert evaluator..."
    temperature=0.0,
    max_tokens=4096,
)

classifier = Classifier(
    name="intent_classifier",
    model="gpt-4o",
    output_type=IntentPrediction,
    prompt_template="Classify this message: {text}\nLabels: {labels}",
)

`evaluate(row)`¶

Formats the prompt_template with the row dict, calls the LLM with structured output, and parses the response:

score = judge.evaluate({"conversation": "...", "question": "..."})
# score is an instance of output_type (e.g. MyScoreModel)

Judge.__call__(row) delegates to evaluate(), making judges compatible with @task.map().

ScoredModel¶

A BaseModel subclass that auto-injects rationale fields. For every field annotated with BinaryScore, OrdinalScore, or ContinuousScore, a corresponding {field}_rationale: str field is created so the LLM can explain its reasoning.

from typing import Annotated
from latent.agents import ScoredModel, OrdinalScore, BinaryScore, ContinuousScore

class QAScores(ScoredModel):
    faithfulness: Annotated[int, OrdinalScore(
        scale=(1, 2, 3, 4, 5),
        pass_threshold=3,
        labels={1: "Hallucinated", 3: "Partial", 5: "Faithful"},
    )]
    relevance: Annotated[int, OrdinalScore(scale=(1, 2, 3, 4, 5))]
    factual: Annotated[int, BinaryScore(description="All claims are factually correct")]
    similarity: Annotated[float, ContinuousScore(min_value=0.0, max_value=1.0)]

This generates four additional fields automatically: faithfulness_rationale, relevance_rationale, factual_rationale, similarity_rationale -- all str with default "".

Score annotations¶

Annotation	Type hint	Parameters	Use case
`BinaryScore`	`int`	`description`	Pass/fail (0 or 1)
`OrdinalScore`	`int`	`scale`, `labels`, `pass_threshold`, `description`	Likert scales (e.g. 1--5)
`ContinuousScore`	`float`	`min_value`, `max_value`, `description`	Float ranges (e.g. 0.0--1.0)

Prompt generation¶

When prompt_template is omitted, Judge auto-generates a scoring rubric from the annotations using build_scoring_prompt():

from latent.agents import build_scoring_prompt

prompt = build_scoring_prompt(QAScores)
print(prompt)

Score on the following dimensions:

- faithfulness (1-5): faithfulness
  [1=Hallucinated, 3=Partial, 5=Faithful]
- relevance (1-5): relevance
- factual (0 or 1): All claims are factually correct
- similarity (0.0-1.0): similarity

For each score, provide a brief rationale explaining your reasoning:
- faithfulness_rationale: explain your faithfulness score
- relevance_rationale: explain your relevance score
- factual_rationale: explain your factual score
- similarity_rationale: explain your similarity score

Full example¶

from typing import Annotated
from latent.agents import Judge, ScoredModel, OrdinalScore

class ResponseQuality(ScoredModel):
    quality: Annotated[int, OrdinalScore(
        scale=(1, 2, 3, 4, 5),
        pass_threshold=3,
        description="Overall response quality",
    )]

judge = Judge("quality_judge", model="gpt-4o", output_type=ResponseQuality)
score = judge.evaluate({"conversation": "User: Hi\nAssistant: Hello!"})

print(score.quality)              # 4
print(score.quality_rationale)    # "The response is polite and appropriate..."

Pre-built judges¶

The latent.agents.judges module provides ready-to-use judges for common evaluation patterns. Each is a factory function that returns a configured Judge instance.

Judge	Input columns	Score type	What it measures
`FaithfulnessJudge`	`context`, `response`	`ContinuousScore(0-1)`	Proportion of claims supported by context
`HallucinationJudge`	`context`, `response`	`ContinuousScore(0-1)`	Hallucination rate
`RelevanceJudge`	`query`, `response`	`OrdinalScore(1-5)`	Response relevance to query
`CompletenessJudge`	`query`, `response`	`OrdinalScore(1-5)`	Completeness of the answer
`ConcisenessJudge`	`response`	`OrdinalScore(1-5)`	Brevity without losing information
`ConsistencyJudge`	`response_a`, `response_b`	`ContinuousScore(0-1)`	Consistency between two responses
`StyleJudge`	`response`	`OrdinalScore(1-5)`	Writing style and tone
`InstructionFollowingJudge`	`instruction`, `response`	`OrdinalScore(1-5)`	How well instructions were followed
`RecoveryJudge`	`context`, `response`	`OrdinalScore(1-5)`	Error recovery quality
`GuardrailsJudge`	`response`	`BinaryScore`	Safety/policy compliance

from latent.agents.judges import FaithfulnessJudge, RelevanceJudge

faith_judge = FaithfulnessJudge(model="gpt-4o")
score = faith_judge.evaluate({
    "context": "The Eiffel Tower is 330 meters tall.",
    "response": "The Eiffel Tower is approximately 330 meters tall.",
})
print(f"Faithfulness: {score.faithfulness:.2f}")

Agent registry¶

The @agent decorator registers a BaseAgent subclass for CLI discovery (latent agents, latent chat):

from latent.agents import agent, LiteLLMAgent

@agent("support_bot")
class SupportBot(LiteLLMAgent):
    """Customer support agent with knowledge base access."""

    def __init__(self):
        super().__init__(
            name="support_bot",
            model="gpt-4o",
            system_prompt="You are a customer support agent.",
        )

latent agents          # lists registered agents
latent chat support_bot  # interactive REPL

RAGAgent¶

RAGAgent extends LiteLLMAgent with a config-driven RAG pipeline. It auto-registers a search_knowledge_base tool so the LLM can retrieve context during conversation.

from latent.agents import RAGAgent, Message

agent = RAGAgent(
    name="support",
    model="gpt-4o",
    rag_config={
        "embeddings": {"provider": "voyage", "model": "voyage-3"},
        "reranker": {"type": "cross_encoder"},
        "post_retrieval": {"type": "reorder"},
    },
)

agent.index_documents(
    texts=["Reset your password at settings > security...", ...],
    sources=["help/passwords.md", ...],
)

response = agent.run([Message(role="user", content="How do I reset my password?")])

Full documentation

See RAG for the complete RAG pipeline configuration reference, including chunking strategies, embedding providers, backend options, reranking, and caching.

Events¶

All agents emit a stream of typed events. Use these for logging, UI rendering, or metrics collection.

Event	Fields	When emitted
`TextDelta`	`text`	Each chunk of generated text
`ToolCall`	`tool_call_id`, `tool_name`, `arguments`	Before tool execution
`ToolResult`	`tool_call_id`, `output`, `error`	After tool execution
`StepBoundary`	`step`	Start of each ReAct iteration
`LLMCallStart`	`run_id`, `model_id`, `input_preview`	Before each LLM API call
`LLMCallEnd`	`run_id`, `model_id`, `input_tokens`, `output_tokens`	After each LLM API call
`Usage`	`input_tokens`, `output_tokens`, `model_id`	Token usage summary
`Error`	`message`	On failure (e.g. max iterations exceeded)
`Message`	`role`, `content`	Chat messages (used as input)

from latent.agents import LiteLLMAgent, Message, ToolCall, ToolResult, Usage

agent = LiteLLMAgent(name="demo", model="gpt-4o", tools=[my_tool])
response = agent.run([Message(role="user", content="Use the tool")])

for event in agent.events:
    if isinstance(event, ToolCall):
        print(f"Called {event.tool_name} with {event.arguments}")
    elif isinstance(event, Usage):
        print(f"Tokens: {event.input_tokens} in, {event.output_tokens} out")

Agents¶

BaseAgent¶

LiteLLMAgent¶

Construction¶

Usage patterns¶

Callable interface¶

The @tool decorator¶

@tool methods on agent subclasses¶

Judge and Classifier¶

Construction¶

evaluate(row)¶

ScoredModel¶

Score annotations¶

Prompt generation¶

Full example¶

Pre-built judges¶

Agent registry¶

RAGAgent¶

Events¶

The `@tool` decorator¶

`@tool` methods on agent subclasses¶

`evaluate(row)`¶