Skip to main content

Welcome to Vectorless

Vectorless is a document understanding engine for AI. It compiles documents into structured trees of meaning, then dispatches multiple agents to reason through headings, sections, and paragraphs — evaluating how each part relates to the whole. The problem it solves is not "where to look", but "what does this mean in context". Every answer is a reasoning act, not a retrieval result.

How It Works

  1. Parse — Documents (Markdown, PDF) are parsed into hierarchical semantic trees, preserving structure and relationships between sections.
  2. Compile — Trees are stored with metadata, keywords, and summaries. The pipeline resolves cross-references ("see Section 2.1") and expands keywords with LLM-generated synonyms for improved recall. Incremental compiling skips unchanged files via content fingerprinting.
  3. Ask — An LLM-powered agent navigates the tree to find the most relevant sections. The Orchestrator coordinates multi-document queries, dispatching Workers that use ls, cd, cat, find, and grep commands to explore the tree and collect evidence.

Quick Start

import asyncio
from vectorless import Engine

async def main():
engine = Engine(
api_key="sk-...",
model="gpt-4o",
)

# Compile a document
result = await engine.compile(path="./report.pdf")
doc_id = result.doc_id

# Ask a question
response = await engine.ask("What is the total revenue?", doc_ids=[doc_id])
print(response.single().content)

asyncio.run(main())

Using a Custom Endpoint

engine = Engine(
api_key="sk-...",
model="gpt-4o",
endpoint="https://api.your-provider.com/v1",
)

From Environment Variables

engine = Engine.from_env()

From Config File

engine = Engine.from_config_file("./config.toml")