Skip to main content

Configuration

PipelineOptions controls every aspect of the compilation pipeline. All fields have sensible defaults and can be overridden using the builder pattern.

PipelineOptions

let options = PipelineOptions::default()
.with_mode(SourceFormat::Pdf)
.with_generate_ids(true)
.with_summary_strategy(SummaryStrategy::full())
.with_thinning(ThinningConfig::enabled(300))
.with_optimization(OptimizationConfig::new())
.with_split(SplitConfig::with_max_tokens(2000))
.with_generate_description(true)
.with_checkpoint_dir("./checkpoints");
FieldTypeDefaultDescription
modeSourceFormatAutoDocument format
generate_idsbooltrueAssign unique IDs to tree nodes
summary_strategySummaryStrategyFullHow to generate LLM summaries
thinningThinningConfigdisabledMerge small nodes into parents
optimizationOptimizationConfigenabledFinal tree optimization
splitSplitConfigenabled (4000 tokens)Split oversized leaf nodes
generate_descriptionbooltrueGenerate a document-level description
concurrencyConcurrencyConfigfrom LLM configMax concurrent LLM requests
reasoning_indexReasoningIndexConfigdefaultSymbol table configuration
existing_treeOption<DocumentTree>NonePrevious tree for incremental updates
processing_versionu321Algorithm version (forces reprocessing on change)
checkpoint_dirOption<PathBuf>NoneDirectory for pipeline checkpoints

SourceFormat

pub enum SourceFormat {
Auto, // Detect from file extension
Markdown, // Force Markdown parsing
Pdf, // Force PDF parsing
}

When set to Auto, the engine detects format from the file extension before calling the compiler. The compiler itself always receives a concrete format.

SummaryStrategy

Controls how the EnhancePass generates LLM summaries:

None

Skip summary generation entirely. Nodes retain their raw content only.

SummaryStrategy::none()

Full (default)

Generate summaries for every node in the tree.

SummaryStrategy::full()
// With custom config:
SummaryStrategy::full_with_config(SummaryStrategyConfig {
max_tokens: 200,
shortcut_threshold: 50,
..Default::default()
})
  • Non-leaf nodes: structured output (OVERVIEW, QUESTIONS, TAGS)
  • Leaf nodes: concise content summaries
  • Nodes below shortcut_threshold tokens use original content (saves LLM cost)

Selective

Generate summaries only for qualifying nodes.

SummaryStrategy::selective(500, true) // min 500 tokens, branch nodes only

Parameters:

  • min_tokens: Only generate summaries for nodes with at least this many tokens
  • branch_only: If true, skip leaf nodes entirely

Lazy

Generate summaries on-demand at query time instead of during compilation.

SummaryStrategy::lazy(true) // persist generated summaries

Summaries are cached in a SummaryCache and optionally persisted. This is useful when many documents are compiled but only a fraction will be queried.

ThinningConfig

Controls how small nodes are merged into their parents during the BuildPass:

// Disabled (default)
ThinningConfig::disabled()

// Enabled with 500-token threshold
ThinningConfig::enabled(500)
.with_merge_content(true)
FieldDefaultDescription
enabledfalseWhether thinning is active
threshold500Nodes below this token count are candidates for merging
merge_contenttrueWhether to merge child content into the parent

Thinning reduces tree depth by absorbing small sections (e.g., single-paragraph subsections) into their parent node. Each parent keeps at least one child.

SplitConfig

Controls how oversized leaf nodes are split:

SplitConfig::default() // enabled, 4000 tokens, pattern split on
SplitConfig::disabled() // no splitting
SplitConfig::with_max_tokens(2000) // custom threshold
.with_pattern_split(true)
FieldDefaultDescription
enabledtrueWhether splitting is active
max_tokens_per_node4000Nodes exceeding this are split
pattern_splittrueUse natural break points (headings, paragraphs)

OptimizationConfig

Controls final tree optimization in the OptimizePass:

OptimizationConfig::new()
.with_max_depth(15)
.with_max_children(20)
FieldDefaultDescription
enabledtrueWhether optimization is active
max_depthNoneFlatten tree if depth exceeds this
max_childrenNoneGroup children if count exceeds this
merge_leaf_threshold0Merge adjacent leaf siblings below this token count

Logic Fingerprint

PipelineOptions::logic_fingerprint() computes a hash of the entire configuration. This is used for:

  • Incremental compilation: detect when pipeline configuration has changed
  • Checkpoint validation: reject stale checkpoints after config changes
  • Content fingerprinting: stored alongside documents for change detection