Agents¶
The agent system provides a layered hierarchy for building LLM-powered agents, from general-purpose conversational agents to specialized evaluation judges. All agents share a common streaming interface and integrate with guardrails, MLflow, and the latent CLI.
from latent.agents import (
LiteLLMAgent, Judge, Classifier, RAGAgent,
tool, Message,
ScoredModel, OrdinalScore, BinaryScore, ContinuousScore,
)
BaseAgent¶
The abstract base class. All agents implement a single method:
async def stream(
self,
messages: list[Message],
*,
config: dict[str, Any] | None = None,
) -> AsyncIterator[AgentEvent]:
...
stream() yields a sequence of typed events (TextDelta, ToolCall, ToolResult, Usage, StepBoundary, Error, etc.) that represent the agent's reasoning and output. Guardrail middleware is auto-discovered from @guardrail-decorated methods and injected at construction time.
LiteLLMAgent¶
Concrete conversational agent powered by LiteLLM. Supports any model provider (OpenAI, Anthropic, Bedrock, Azure, etc.) and implements the ReAct pattern: the LLM can call tools, observe results, and iterate.
Construction¶
from latent.agents import LiteLLMAgent
agent = LiteLLMAgent(
name="assistant",
model="gpt-4o",
system_prompt="You are a helpful assistant.",
temperature=0.0,
max_tokens=4096,
max_iterations=10, # ReAct loop cap
response_format=None, # Pydantic model for structured output
tools=None, # list of @tool-decorated functions
optimized_prompt=None, # path or OptimizedPrompt object
)
Parameters
| Name | Type | Default | Description |
|---|---|---|---|
name |
str |
required | Human-readable name for logging and MLflow |
model |
str |
required | LiteLLM model identifier (e.g. gpt-4o, anthropic/claude-sonnet-4-20250514) |
system_prompt |
str \| None |
None |
System message prepended to every call |
tools |
list[Callable] \| None |
None |
Functions decorated with @tool |
temperature |
float |
0.0 |
Sampling temperature |
max_tokens |
int |
4096 |
Max output tokens per LLM call |
max_iterations |
int |
10 |
Maximum ReAct loop iterations |
response_format |
type[BaseModel] \| dict \| None |
None |
Structured output schema (strict mode enforced) |
optimized_prompt |
str \| Path \| None |
None |
Load an optimized prompt from file |
Usage patterns¶
from pydantic import BaseModel
from latent.agents import LiteLLMAgent, Message
class City(BaseModel):
name: str
country: str
population: int
agent = LiteLLMAgent(
name="extractor",
model="gpt-4o",
response_format=City,
)
text = agent.run([Message(role="user", content="Tell me about Tokyo")])
city = City.model_validate_json(text)
Callable interface¶
agent(question) resets context and returns a response string. Designed for evaluation frameworks where each call should be stateless:
agent = LiteLLMAgent(name="eval_target", model="gpt-4o")
response = agent("What is the capital of France?") # resets, asks, returns text
The @tool decorator¶
Marks a function as a tool for LiteLLM function calling. The function must have a docstring (used as the tool description) and type-annotated parameters (used to generate the JSON Schema).
from latent.agents import tool
@tool
def search_web(query: str) -> str:
"""Search the web for the given query and return results."""
return do_search(query)
@tool
def get_weather(city: str, units: str = "celsius") -> str:
"""Get current weather for a city."""
return fetch_weather(city, units)
Tools can be passed to LiteLLMAgent at construction:
@tool methods on agent subclasses¶
@tool-decorated methods on a LiteLLMAgent subclass are auto-discovered at construction -- no need to pass them via tools=:
from latent.agents import LiteLLMAgent, tool
class ResearchAgent(LiteLLMAgent):
def __init__(self):
super().__init__(name="researcher", model="gpt-4o")
@tool
def search_papers(self, query: str) -> str:
"""Search academic papers for the given query."""
return self._do_search(query)
@tool
def summarize(self, text: str) -> str:
"""Summarize the given text."""
return self._summarize(text)
Docstrings are required
A ValueError is raised if a @tool-decorated function has no docstring. The docstring becomes the tool description that the LLM sees.
Async tools
Both sync and async functions work as tools. Async tools are awaited automatically during the ReAct loop.
Judge and Classifier¶
Specialized agents for LLM-as-a-judge evaluation. Judge[T] scores a data row against a Pydantic output schema. Classifier[T] is a semantic alias with identical behavior, used when the output model has a prediction field.
Construction¶
from latent.agents import Judge, Classifier
judge = Judge(
name="qa_judge",
model="gpt-4o",
output_type=MyScoreModel, # Pydantic model (required)
prompt_template=None, # auto-generated from score annotations if omitted
system_prompt=None, # defaults to "You are an expert evaluator..."
temperature=0.0,
max_tokens=4096,
)
classifier = Classifier(
name="intent_classifier",
model="gpt-4o",
output_type=IntentPrediction,
prompt_template="Classify this message: {text}\nLabels: {labels}",
)
evaluate(row)¶
Formats the prompt_template with the row dict, calls the LLM with structured output, and parses the response:
score = judge.evaluate({"conversation": "...", "question": "..."})
# score is an instance of output_type (e.g. MyScoreModel)
Judge.__call__(row) delegates to evaluate(), making judges compatible with @task.map().
ScoredModel¶
A BaseModel subclass that auto-injects rationale fields. For every field annotated with BinaryScore, OrdinalScore, or ContinuousScore, a corresponding {field}_rationale: str field is created so the LLM can explain its reasoning.
from typing import Annotated
from latent.agents import ScoredModel, OrdinalScore, BinaryScore, ContinuousScore
class QAScores(ScoredModel):
faithfulness: Annotated[int, OrdinalScore(
scale=(1, 2, 3, 4, 5),
pass_threshold=3,
labels={1: "Hallucinated", 3: "Partial", 5: "Faithful"},
)]
relevance: Annotated[int, OrdinalScore(scale=(1, 2, 3, 4, 5))]
factual: Annotated[int, BinaryScore(description="All claims are factually correct")]
similarity: Annotated[float, ContinuousScore(min_value=0.0, max_value=1.0)]
This generates four additional fields automatically: faithfulness_rationale, relevance_rationale, factual_rationale, similarity_rationale -- all str with default "".
Score annotations¶
| Annotation | Type hint | Parameters | Use case |
|---|---|---|---|
BinaryScore |
int |
description |
Pass/fail (0 or 1) |
OrdinalScore |
int |
scale, labels, pass_threshold, description |
Likert scales (e.g. 1--5) |
ContinuousScore |
float |
min_value, max_value, description |
Float ranges (e.g. 0.0--1.0) |
Prompt generation¶
When prompt_template is omitted, Judge auto-generates a scoring rubric from the annotations using build_scoring_prompt():
from latent.agents import build_scoring_prompt
prompt = build_scoring_prompt(QAScores)
print(prompt)
Score on the following dimensions:
- faithfulness (1-5): faithfulness
[1=Hallucinated, 3=Partial, 5=Faithful]
- relevance (1-5): relevance
- factual (0 or 1): All claims are factually correct
- similarity (0.0-1.0): similarity
For each score, provide a brief rationale explaining your reasoning:
- faithfulness_rationale: explain your faithfulness score
- relevance_rationale: explain your relevance score
- factual_rationale: explain your factual score
- similarity_rationale: explain your similarity score
Full example¶
from typing import Annotated
from latent.agents import Judge, ScoredModel, OrdinalScore
class ResponseQuality(ScoredModel):
quality: Annotated[int, OrdinalScore(
scale=(1, 2, 3, 4, 5),
pass_threshold=3,
description="Overall response quality",
)]
judge = Judge("quality_judge", model="gpt-4o", output_type=ResponseQuality)
score = judge.evaluate({"conversation": "User: Hi\nAssistant: Hello!"})
print(score.quality) # 4
print(score.quality_rationale) # "The response is polite and appropriate..."
Pre-built judges¶
The latent.agents.judges module provides ready-to-use judges for common evaluation patterns. Each is a factory function that returns a configured Judge instance.
| Judge | Input columns | Score type | What it measures |
|---|---|---|---|
FaithfulnessJudge |
context, response |
ContinuousScore(0-1) |
Proportion of claims supported by context |
HallucinationJudge |
context, response |
ContinuousScore(0-1) |
Hallucination rate |
RelevanceJudge |
query, response |
OrdinalScore(1-5) |
Response relevance to query |
CompletenessJudge |
query, response |
OrdinalScore(1-5) |
Completeness of the answer |
ConcisenessJudge |
response |
OrdinalScore(1-5) |
Brevity without losing information |
ConsistencyJudge |
response_a, response_b |
ContinuousScore(0-1) |
Consistency between two responses |
StyleJudge |
response |
OrdinalScore(1-5) |
Writing style and tone |
InstructionFollowingJudge |
instruction, response |
OrdinalScore(1-5) |
How well instructions were followed |
RecoveryJudge |
context, response |
OrdinalScore(1-5) |
Error recovery quality |
GuardrailsJudge |
response |
BinaryScore |
Safety/policy compliance |
from latent.agents.judges import FaithfulnessJudge, RelevanceJudge
faith_judge = FaithfulnessJudge(model="gpt-4o")
score = faith_judge.evaluate({
"context": "The Eiffel Tower is 330 meters tall.",
"response": "The Eiffel Tower is approximately 330 meters tall.",
})
print(f"Faithfulness: {score.faithfulness:.2f}")
Agent registry¶
The @agent decorator registers a BaseAgent subclass for CLI discovery (latent agents, latent chat):
from latent.agents import agent, LiteLLMAgent
@agent("support_bot")
class SupportBot(LiteLLMAgent):
"""Customer support agent with knowledge base access."""
def __init__(self):
super().__init__(
name="support_bot",
model="gpt-4o",
system_prompt="You are a customer support agent.",
)
RAGAgent¶
RAGAgent extends LiteLLMAgent with a config-driven RAG pipeline. It auto-registers a search_knowledge_base tool so the LLM can retrieve context during conversation.
from latent.agents import RAGAgent, Message
agent = RAGAgent(
name="support",
model="gpt-4o",
rag_config={
"embeddings": {"provider": "voyage", "model": "voyage-3"},
"reranker": {"type": "cross_encoder"},
"post_retrieval": {"type": "reorder"},
},
)
agent.index_documents(
texts=["Reset your password at settings > security...", ...],
sources=["help/passwords.md", ...],
)
response = agent.run([Message(role="user", content="How do I reset my password?")])
Full documentation
See RAG for the complete RAG pipeline configuration reference, including chunking strategies, embedding providers, backend options, reranking, and caching.
Events¶
All agents emit a stream of typed events. Use these for logging, UI rendering, or metrics collection.
| Event | Fields | When emitted |
|---|---|---|
TextDelta |
text |
Each chunk of generated text |
ToolCall |
tool_call_id, tool_name, arguments |
Before tool execution |
ToolResult |
tool_call_id, output, error |
After tool execution |
StepBoundary |
step |
Start of each ReAct iteration |
LLMCallStart |
run_id, model_id, input_preview |
Before each LLM API call |
LLMCallEnd |
run_id, model_id, input_tokens, output_tokens |
After each LLM API call |
Usage |
input_tokens, output_tokens, model_id |
Token usage summary |
Error |
message |
On failure (e.g. max iterations exceeded) |
Message |
role, content |
Chat messages (used as input) |
from latent.agents import LiteLLMAgent, Message, ToolCall, ToolResult, Usage
agent = LiteLLMAgent(name="demo", model="gpt-4o", tools=[my_tool])
response = agent.run([Message(role="user", content="Use the tool")])
for event in agent.events:
if isinstance(event, ToolCall):
print(f"Called {event.tool_name} with {event.arguments}")
elif isinstance(event, Usage):
print(f"Tokens: {event.input_tokens} in, {event.output_tokens} out")