Technical Overview

For IT managers and technical evaluators

This page explains how LuxonLink works under the hood—without marketing language.

Document-Aware Indexing

The Problem

Most RAG systems chunk documents blindly at fixed character counts (e.g., every 500 characters). This breaks sentences mid-thought, splits tables, and loses context. When the system retrieves these fragments, answers are incomplete or inaccurate.

LuxonLink's Approach: Parent/Child Chunking

We use a hierarchical approach:

  1. Parent chunks: Larger context blocks (~2400 characters) that respect document structure (sections, headings, paragraphs).
  2. Child chunks: Smaller, searchable units (~800 characters) extracted from parents, with references back to their parent.
  3. Vector search: Queries match against child chunks (precise, fast).
  4. Context retrieval: When a child chunk matches, we return its parent chunk to the LLM for context.

Result: The LLM gets full context without truncation, leading to more accurate answers.

Example: If a policy section is 3000 characters, we create one parent (the full section) and 3-4 child chunks. When a query matches child #2, we send the entire parent to the LLM for answer generation. For the product-level overview, see LuxonLink internal AI knowledge base software.

Metadata & Citations

Every Chunk Has Metadata

Each indexed chunk carries metadata that enables accurate citations:

  • source_file: Original document name (e.g., "HR_Handbook_2026.pdf")
  • source_path: Department folder (e.g., "HR/Policies/")
  • page_number: Exact page in PDF (if applicable)
  • section_title: Heading or context (if detected)
  • doc_hash: File hash for tracking versions
  • indexed_at: Timestamp of indexing

How Citations Work

When LuxonLink generates an answer:

  1. Vector search retrieves top-N matching chunks with their metadata.
  2. The LLM generates an answer using the chunk content.
  3. We extract the metadata and format it as a clickable citation.
  4. Users click the citation and see the exact source (PDF page, DOCX section, etc.).

No guessing. Every answer is traceable to its source document and location.

Example: A user asks "What's our PTO policy?" LuxonLink returns: "You accrue 15 days annually" with a citation: HR_Handbook_2026.pdf, page 12. Clicking opens the PDF to page 12.

Conditional OCR

Why Not OCR Everything?

Running OCR on every document is slow and wastes resources. Most PDFs already contain searchable text.

LuxonLink's OCR Logic

  1. When a PDF is ingested, we extract text directly.
  2. If the extracted text has fewer than ~50 words, we flag it as likely scanned.
  3. Only then do we apply OCR to extract text from images.
  4. The extracted text is indexed like any other content.

Result: Fast indexing for most documents, accurate text extraction when needed.

Example: A signed contract PDF (scanned) is uploaded. LuxonLink detects <10 words, runs OCR, extracts the contract text, and indexes it. A native-text policy PDF skips OCR entirely.

Updates, Deletions & Re-Indexing

How We Track Changes

LuxonLink monitors your document sources (file shares, cloud storage) for changes:

  • File added: New file is indexed automatically.
  • File modified: We detect the change via file hash, delete old chunks, re-index the new version.
  • File deleted: All chunks from that document are removed from the index.
  • File moved: Treated as a delete + add (metadata updated).

No manual refresh required. Your knowledge base stays current automatically.

Version Control

When a document is updated, we:

  1. Store the new version with a new doc_hash.
  2. Mark old chunks as outdated (not deleted immediately for audit purposes).
  3. Queries always match against the latest version.

Result: Answers reflect current policy, not outdated documents.

Example: HR updates the PTO policy from 10 days to 15 days. LuxonLink detects the file change, removes old chunks, indexes the new version. Future queries now return "15 days".

Security & Privacy

No Training on Your Data

xillix uses OpenAI's embedding and generation APIs. Your documents are processed to generate embeddings and answers, but OpenAI does not train models on customer data when using their API (per their terms). xillix manages key access as part of the deployment.

We store encrypted copies for indexing and retrieval. When synced, original files remain in your source storage as the source of truth.

Department Isolation

Each LuxonLink bot is scoped to a single department:

  • HR bot sees only HR documents.
  • IT bot sees only IT documents.
  • Sales bot sees only Sales documents.

Role-based access control (RBAC) ensures users can only query bots they have permission to access. This is enforced at the application layer before any query reaches the vector database.

Audit Trails

Every query is logged with:

  • User ID
  • Timestamp
  • Query text
  • Retrieved chunks (for compliance review)
  • Answer generated

This allows compliance teams to review what information was accessed and by whom.

Summary: Your data stays yours. No training, no leakage across departments, full audit logs.

Hosting & Infrastructure

Fully Managed by xillix

xillix is a SaaS platform. xillix handles:

  • Infrastructure provisioning and scaling
  • Database backups and disaster recovery
  • Security patching and updates
  • Monitoring and uptime

Customers do not deploy code or manage infrastructure. You access LuxonLink via web browser.

Enterprise: Dedicated Instances

For customers with strict data isolation requirements, xillix provides dedicated instances:

  • Single-tenant infrastructure (your data never shares compute/storage with other customers)
  • Still hosted and managed by xillix
  • Custom security controls (enhanced encryption, network isolation, etc.)

You still do not deploy or manage code. xillix operates your dedicated instance on your behalf.

Bottom line: xillix is SaaS. You don't deploy containers, configure servers, or manage databases. We do that.