Skip to content

RAPTOR Integration

RAPTOR (Recursive Abstractive Processing for Tree-Organized Retrieval) is a retrieval method that builds a hierarchical tree of document summaries. Instead of flat chunk-based retrieval, RAPTOR clusters text chunks, summarizes each cluster, and repeats the process across multiple layers. This produces a tree where higher layers contain increasingly abstract summaries, enabling retrieval at different levels of granularity.

The latent.raptor module provides RaptorAdapter, a thin wrapper around the raptor-rag library that integrates with latent's config system, MLflow tracking, and LiteLLM model routing.

Installation

RAPTOR support is an optional dependency:

uv add latent[raptor]

Quick Start

from latent.raptor import RaptorAdapter

adapter = RaptorAdapter(
    embedding_model="text-embedding-3-small",
    summarization_model="gpt-4o-mini",
    qa_model="gpt-4o-mini",
)

# Build a tree from documents
tree = adapter.build_tree("Your document text here...")

# Retrieve relevant context
context = adapter.retrieve("What is the main topic?")

# Get a generated answer
answer = adapter.answer("What is the main topic?")

Configuration

All RaptorAdapter parameters are flat scalars, designed to map directly from parameters.yaml:

# flows/my_flow/parameters.yaml
raptor:
  embedding_model: text-embedding-3-small
  summarization_model: gpt-4o-mini
  qa_model: gpt-4o-mini
  tb_max_tokens: 100
  tb_num_layers: 5
  tb_threshold: 0.5
  tb_summarization_length: 100
  tr_top_k: 5
  tr_selection_mode: top_k
  collapse_tree: true

Parameter Reference

Parameter Default Description
embedding_model text-embedding-ada-002 LiteLLM model name for embeddings
summarization_model gpt-4o-mini LiteLLM model name for tree summarization
qa_model gpt-4o-mini LiteLLM model name for question answering
tb_max_tokens 100 Max tokens per tree builder chunk
tb_num_layers 5 Number of tree layers to build
tb_threshold 0.5 Similarity threshold for tree building
tb_summarization_length 100 Max tokens for node summaries
tr_threshold 0.5 Similarity threshold for retrieval
tr_top_k 5 Number of top results to retrieve
tr_selection_mode top_k Retrieval selection mode (top_k or threshold)
tr_num_layers None Number of layers to traverse during retrieval
tr_start_layer None Starting layer for retrieval
collapse_tree True Use collapsed tree retrieval (vs layer-by-layer)

Usage with Latent Flows

Use RaptorAdapter.from_params() to create an adapter from the flow's parameter context:

from pathlib import Path

from latent.prefect import flow, task, params
from latent.raptor import RaptorAdapter

TREE_PATH = Path("data/raptor_qa/raptor_tree.pkl")


@task("build_index")
def build_index(documents: str) -> None:
    adapter = RaptorAdapter.from_params(params.raptor)
    adapter.build_tree(documents)
    adapter.save_tree(TREE_PATH)


@task("query_index")
def query_index(question: str) -> str:
    adapter = RaptorAdapter.from_params(params.raptor)
    adapter.load_tree(TREE_PATH)
    return adapter.answer(question)


@flow("raptor_qa")
def raptor_qa():
    build_index("Your corpus text...")
    answer = query_index("What is the key finding?")
    return answer

MLflow Tracking

build_tree() automatically logs metrics to MLflow when a run is active:

  • raptor_num_nodes -- total nodes in the tree
  • raptor_num_layers -- number of tree layers
  • raptor_num_leaf_nodes -- number of leaf (original chunk) nodes
  • raptor_build_duration_s -- time to build the tree in seconds

Saving and Loading Trees

Trees can be persisted and reloaded to avoid rebuilding. Both str and Path are accepted:

from pathlib import Path

adapter = RaptorAdapter(embedding_model="text-embedding-3-small")

# Build and save
tree_path = Path("data/my_flow/raptor_tree.pkl")
adapter.build_tree(documents)
adapter.save_tree(tree_path)

# Load in a later run
adapter.load_tree(tree_path)
context = adapter.retrieve("query")