Skip to main content

Checkpoint and Resume

Checkpointing allows the pipeline to resume from where it left off after an interruption (crash, timeout, process kill). This is critical for large documents where LLM-enhanced compilation can take minutes.

How It Works

When PipelineOptions::checkpoint_dir is set, the orchestrator saves state to disk after each execution group completes:

Group 0: [ParsePass] → save checkpoint
Group 1: [BuildPass] → save checkpoint
Group 2: [ValidatePass, SplitPass] → save checkpoint
Group 3: [EnhancePass] → save checkpoint ← expensive LLM calls
...

On restart, the orchestrator loads the checkpoint and skips already-completed passes.

What's Stored

Each checkpoint contains:

pub struct PipelineCheckpoint {
pub doc_id: String,
pub source_hash: String, // SHA-256 of source content
pub processing_version: u32, // Algorithm version
pub config_fingerprint: String, // Hash of PipelineOptions
pub completed_stages: Vec<String>, // Names of completed passes
pub context_data: CheckpointContextData,
pub timestamp: DateTime<Utc>,
}

pub struct CheckpointContextData {
pub raw_nodes: Vec<RawNode>, // From ParsePass
pub tree: Option<DocumentTree>, // From BuildPass
pub metrics: IndexMetrics, // Cumulative metrics
pub page_count: Option<usize>,
pub line_count: Option<usize>,
pub description: Option<String>,
}

Validation

Before resuming, the checkpoint is validated against the current input:

CheckPurpose
source_hash matchesSource content hasn't changed
processing_version matchesAlgorithm hasn't been upgraded
config_fingerprint matchesPipeline options haven't changed

If any check fails, the checkpoint is discarded and the pipeline starts fresh.

Lifecycle

┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ Start │────▶│ Load │────▶│ Valid? │
│ Pipeline │ │ Checkpoint │ │ │
└──────────────┘ └──────────────┘ └──┬───────┬───┘
│ │
Yes │ No │
│ │
┌─────────▼──┐ ┌─▼──────────┐
│ Resume from │ │ Start fresh │
│ completed │ │ │
│ stages │ │ │
└──────┬──────┘ └────────────┘

┌────────────▼─────────────┐
│ Execute remaining passes │
│ Save after each group │
└────────────┬─────────────┘

┌────────────▼─────────────┐
│ All complete? │
│ → Clear checkpoint file │
└──────────────────────────┘

Configuration

let options = PipelineOptions::default()
.with_checkpoint_dir("./workspace/checkpoints");

Checkpoints are stored as individual JSON files in the checkpoint directory, one per document (keyed by doc_id). On successful completion, the checkpoint file is deleted.

CheckpointManager API

let manager = CheckpointManager::new("./checkpoints");

// Save checkpoint
manager.save(&doc_id, &checkpoint)?;

// Load checkpoint
let checkpoint = manager.load(&doc_id);

// Check if valid for resume
let valid = CheckpointManager::is_valid_for_resume(
&checkpoint,
&source_hash,
processing_version,
&config_fingerprint,
);

// Clear after successful completion
manager.clear(&doc_id)?;