Welcome to Vectorless

Vectorless is a document understanding engine for AI. It compiles documents into structured trees of meaning, then dispatches multiple agents to reason through headings, sections, and paragraphs — evaluating how each part relates to the whole. The problem it solves is not "where to look", but "what does this mean in context". Every answer is a reasoning act, not a retrieval result.

How It Works

Parse — Documents (Markdown, PDF) are parsed into hierarchical semantic trees, preserving structure and relationships between sections.
Compile — Trees are stored with metadata, keywords, and summaries. The pipeline resolves cross-references ("see Section 2.1") and expands keywords with LLM-generated synonyms for improved recall. Incremental compiling skips unchanged files via content fingerprinting.
Ask — An LLM-powered agent navigates the tree to find the most relevant sections. The Orchestrator coordinates multi-document queries, dispatching Workers that use ls, cd, cat, find, and grep commands to explore the tree and collect evidence.

Quick Start

import asyncio
from vectorless import Engine

async def main():
    engine = Engine(
        api_key="sk-...",
        model="gpt-4o",
    )

    # Compile a document
    result = await engine.compile(path="./report.pdf")
    doc_id = result.doc_id

    # Ask a question
    response = await engine.ask("What is the total revenue?", doc_ids=[doc_id])
    print(response.single().content)

asyncio.run(main())

Using a Custom Endpoint

engine = Engine(
    api_key="sk-...",
    model="gpt-4o",
    endpoint="https://api.your-provider.com/v1",
)

From Environment Variables

engine = Engine.from_env()

From Config File

engine = Engine.from_config_file("./config.toml")

How It Works​

Quick Start​

Using a Custom Endpoint​

From Environment Variables​

From Config File​

How It Works

Quick Start

Using a Custom Endpoint

From Environment Variables

From Config File