RAPTOR Integration¶

RAPTOR (Recursive Abstractive Processing for Tree-Organized Retrieval) is a retrieval method that builds a hierarchical tree of document summaries. Instead of flat chunk-based retrieval, RAPTOR clusters text chunks, summarizes each cluster, and repeats the process across multiple layers. This produces a tree where higher layers contain increasingly abstract summaries, enabling retrieval at different levels of granularity.

The latent.raptor module provides RaptorAdapter, a thin wrapper around the raptor-rag library that integrates with latent's config system, MLflow tracking, and LiteLLM model routing.

Installation¶

RAPTOR support is an optional dependency:

uv add latent[raptor]

Quick Start¶

from latent.raptor import RaptorAdapter

adapter = RaptorAdapter(
    embedding_model="text-embedding-3-small",
    summarization_model="gpt-4o-mini",
    qa_model="gpt-4o-mini",
)

# Build a tree from documents
tree = adapter.build_tree("Your document text here...")

# Retrieve relevant context
context = adapter.retrieve("What is the main topic?")

# Get a generated answer
answer = adapter.answer("What is the main topic?")

Configuration¶

All RaptorAdapter parameters are flat scalars, designed to map directly from parameters.yaml:

# flows/my_flow/parameters.yaml
raptor:
  embedding_model: text-embedding-3-small
  summarization_model: gpt-4o-mini
  qa_model: gpt-4o-mini
  tb_max_tokens: 100
  tb_num_layers: 5
  tb_threshold: 0.5
  tb_summarization_length: 100
  tr_top_k: 5
  tr_selection_mode: top_k
  collapse_tree: true

Parameter Reference¶

Parameter	Default	Description
`embedding_model`	`text-embedding-ada-002`	LiteLLM model name for embeddings
`summarization_model`	`gpt-4o-mini`	LiteLLM model name for tree summarization
`qa_model`	`gpt-4o-mini`	LiteLLM model name for question answering
`tb_max_tokens`	`100`	Max tokens per tree builder chunk
`tb_num_layers`	`5`	Number of tree layers to build
`tb_threshold`	`0.5`	Similarity threshold for tree building
`tb_summarization_length`	`100`	Max tokens for node summaries
`tr_threshold`	`0.5`	Similarity threshold for retrieval
`tr_top_k`	`5`	Number of top results to retrieve
`tr_selection_mode`	`top_k`	Retrieval selection mode (`top_k` or `threshold`)
`tr_num_layers`	`None`	Number of layers to traverse during retrieval
`tr_start_layer`	`None`	Starting layer for retrieval
`collapse_tree`	`True`	Use collapsed tree retrieval (vs layer-by-layer)

Usage with Latent Flows¶

Use RaptorAdapter.from_params() to create an adapter from the flow's parameter context:

from pathlib import Path

from latent.prefect import flow, task, params
from latent.raptor import RaptorAdapter

TREE_PATH = Path("data/raptor_qa/raptor_tree.pkl")


@task("build_index")
def build_index(documents: str) -> None:
    adapter = RaptorAdapter.from_params(params.raptor)
    adapter.build_tree(documents)
    adapter.save_tree(TREE_PATH)


@task("query_index")
def query_index(question: str) -> str:
    adapter = RaptorAdapter.from_params(params.raptor)
    adapter.load_tree(TREE_PATH)
    return adapter.answer(question)


@flow("raptor_qa")
def raptor_qa():
    build_index("Your corpus text...")
    answer = query_index("What is the key finding?")
    return answer

MLflow Tracking¶

build_tree() automatically logs metrics to MLflow when a run is active:

raptor_num_nodes -- total nodes in the tree
raptor_num_layers -- number of tree layers
raptor_num_leaf_nodes -- number of leaf (original chunk) nodes
raptor_build_duration_s -- time to build the tree in seconds

Saving and Loading Trees¶

Trees can be persisted and reloaded to avoid rebuilding. Both str and Path are accepted:

from pathlib import Path

adapter = RaptorAdapter(embedding_model="text-embedding-3-small")

# Build and save
tree_path = Path("data/my_flow/raptor_tree.pkl")
adapter.build_tree(documents)
adapter.save_tree(tree_path)

# Load in a later run
adapter.load_tree(tree_path)
context = adapter.retrieve("query")