RAPTOR Integration¶
RAPTOR (Recursive Abstractive Processing for Tree-Organized Retrieval) is a retrieval method that builds a hierarchical tree of document summaries. Instead of flat chunk-based retrieval, RAPTOR clusters text chunks, summarizes each cluster, and repeats the process across multiple layers. This produces a tree where higher layers contain increasingly abstract summaries, enabling retrieval at different levels of granularity.
The latent.raptor module provides RaptorAdapter, a thin wrapper around the raptor-rag library that integrates with latent's config system, MLflow tracking, and LiteLLM model routing.
Installation¶
RAPTOR support is an optional dependency:
Quick Start¶
from latent.raptor import RaptorAdapter
adapter = RaptorAdapter(
embedding_model="text-embedding-3-small",
summarization_model="gpt-4o-mini",
qa_model="gpt-4o-mini",
)
# Build a tree from documents
tree = adapter.build_tree("Your document text here...")
# Retrieve relevant context
context = adapter.retrieve("What is the main topic?")
# Get a generated answer
answer = adapter.answer("What is the main topic?")
Configuration¶
All RaptorAdapter parameters are flat scalars, designed to map directly from parameters.yaml:
# flows/my_flow/parameters.yaml
raptor:
embedding_model: text-embedding-3-small
summarization_model: gpt-4o-mini
qa_model: gpt-4o-mini
tb_max_tokens: 100
tb_num_layers: 5
tb_threshold: 0.5
tb_summarization_length: 100
tr_top_k: 5
tr_selection_mode: top_k
collapse_tree: true
Parameter Reference¶
| Parameter | Default | Description |
|---|---|---|
embedding_model |
text-embedding-ada-002 |
LiteLLM model name for embeddings |
summarization_model |
gpt-4o-mini |
LiteLLM model name for tree summarization |
qa_model |
gpt-4o-mini |
LiteLLM model name for question answering |
tb_max_tokens |
100 |
Max tokens per tree builder chunk |
tb_num_layers |
5 |
Number of tree layers to build |
tb_threshold |
0.5 |
Similarity threshold for tree building |
tb_summarization_length |
100 |
Max tokens for node summaries |
tr_threshold |
0.5 |
Similarity threshold for retrieval |
tr_top_k |
5 |
Number of top results to retrieve |
tr_selection_mode |
top_k |
Retrieval selection mode (top_k or threshold) |
tr_num_layers |
None |
Number of layers to traverse during retrieval |
tr_start_layer |
None |
Starting layer for retrieval |
collapse_tree |
True |
Use collapsed tree retrieval (vs layer-by-layer) |
Usage with Latent Flows¶
Use RaptorAdapter.from_params() to create an adapter from the flow's parameter context:
from pathlib import Path
from latent.prefect import flow, task, params
from latent.raptor import RaptorAdapter
TREE_PATH = Path("data/raptor_qa/raptor_tree.pkl")
@task("build_index")
def build_index(documents: str) -> None:
adapter = RaptorAdapter.from_params(params.raptor)
adapter.build_tree(documents)
adapter.save_tree(TREE_PATH)
@task("query_index")
def query_index(question: str) -> str:
adapter = RaptorAdapter.from_params(params.raptor)
adapter.load_tree(TREE_PATH)
return adapter.answer(question)
@flow("raptor_qa")
def raptor_qa():
build_index("Your corpus text...")
answer = query_index("What is the key finding?")
return answer
MLflow Tracking¶
build_tree() automatically logs metrics to MLflow when a run is active:
raptor_num_nodes-- total nodes in the treeraptor_num_layers-- number of tree layersraptor_num_leaf_nodes-- number of leaf (original chunk) nodesraptor_build_duration_s-- time to build the tree in seconds
Saving and Loading Trees¶
Trees can be persisted and reloaded to avoid rebuilding. Both str and Path are accepted:
from pathlib import Path
adapter = RaptorAdapter(embedding_model="text-embedding-3-small")
# Build and save
tree_path = Path("data/my_flow/raptor_tree.pkl")
adapter.build_tree(documents)
adapter.save_tree(tree_path)
# Load in a later run
adapter.load_tree(tree_path)
context = adapter.retrieve("query")