AI-Powered Schema Extraction

Turn Documents Into Structured Data

Extract any metadata you need from your documents using AI. Define custom schemas and let Iris pull out product specs, invoice details, contract terms, or any structured information buried in unstructured text.

Metadata extraction interface

Your Documents Are Full of Data Your AI Can't Use

Every document contains structured information that could make your AI searches precise and powerful. But without extraction, it's all just text.

Documents Hide Critical Information
Important data like part numbers, prices, and specifications are buried in paragraphs of text. Your AI can't filter or search by these values.
Search Returns Too Much Noise
When you search for "contracts expiring in Q3", you get every document mentioning Q3. There's no way to filter by actual expiration dates.
No Structured Filtering
You can't query documents by invoice amount, product type, or author. Every search is just keyword matching with no business context.

Real Example: Invoice Processing

Without Metadata Extraction

Query: "invoices over $50,000"
Results: Anything semantically similar to "invoices over $50,000"

With Metadata Extraction

Query: invoice_amount > 50000
Results: Only actual invoices with amounts exceeding $50,000

Extract Any Data You Need

Define what matters to your business. Iris will find it in every document.

AI-Powered Extraction
Iris analyzes your documents and extracts structured data based on your custom schemas. No regex, no rules, just AI understanding.
Visual Schema Editor
Build extraction schemas with our drag-and-drop editor. Add fields, set types, and preview results without writing JSON.
Two-Level Extraction
Extract document-level metadata (author, date, type) and section-level details (prices, specs, items) in a single pass.

Example: Extract What Your Business Needs

INVOICE

INV-2024-0152

From:

Acme Corp

123 Tech Street

San Francisco, CA 94105

To:

Your Company Inc

456 Business Ave

New York, NY 10001

Date:Feb 15, 2024
Due:Mar 15, 2024
ItemQtyPriceTotal
Server Rack 42U5$15,000$75,000

Total: $75,000.00

Extracted Data (JSON)

{
  "document_metadata": {
    "invoice_number": "INV-2024-0152",
    "vendor_name": "Acme Corp",
    "total_amount": 75000,
    "due_date": "2024-03-15"
  },
  "sections": {
    "line_items": [
      {
        "product_name": "Server Rack 42U",
        "quantity": 5,
        "unit_price": 15000
      }
    ]
  }
}

From Documents to Structured Data in 4 Steps

No complex rules or regex. Just tell us what you want to extract.

01

Define Your Schema

Use our visual editor to define what data you want to extract. Start from templates or let AI suggest a schema.

02

Test on Real Documents

Upload sample documents and see extracted results instantly. Refine your schema until it captures exactly what you need.

03

Add to Pipeline

Enable metadata extraction in your RAG pipeline. Every document will be analyzed and enriched automatically.

04

Query with Precision

Filter and search using extracted metadata. Find invoices by amount, contracts by date, or specs by product.

Three Ways to Create Schemas

Start from Templates

Pre-built schemas for invoices, receipts, contracts, and more

AI-Generated Schemas

Upload a document and let Iris suggest the perfect schema

Custom Fields

Define any field type: strings, numbers, arrays, or nested objects

Schema Builder

Document Metadata
invoice_number:string
total_amount:number
Section Metadata
product_name:string
unit_price:number

Live Preview

Upload a document to see extracted metadata in real-time...

Unlock the Value in Your Documents

Turn unstructured text into actionable business data

Search & Retrieval

  • Filter documents by exact values, not just keywords
  • Combine semantic search with metadata filters
  • Query numeric ranges (amounts, dates, quantities)
  • Find documents by any extracted field

Business Intelligence

  • Aggregate data across thousands of documents
  • Track trends in contracts, invoices, or reports
  • Export structured data for analysis
  • Build dashboards from unstructured sources

Automation

  • Route documents based on extracted values
  • Trigger workflows from metadata conditions
  • Auto-categorize incoming documents
  • Flag documents that need attention

Start Extracting in Minutes

Your first schema is free. See how much value is hidden in your documents.