Technical Overview
For IT managers and technical evaluators
This page explains how LuxonLink works under the hood—without marketing language.
Document-Aware Indexing
The Problem
Most RAG systems chunk documents blindly at fixed character counts (e.g., every 500 characters). This breaks sentences mid-thought, splits tables, and loses context. When the system retrieves these fragments, answers are incomplete or inaccurate.
LuxonLink's Approach: Parent/Child Chunking
We use a hierarchical approach:
- Parent chunks: Larger context blocks (~2400 characters) that respect document structure (sections, headings, paragraphs).
- Child chunks: Smaller, searchable units (~800 characters) extracted from parents, with references back to their parent.
- Vector search: Queries match against child chunks (precise, fast).
- Context retrieval: When a child chunk matches, we return its parent chunk to the LLM for context.
Result: The LLM gets full context without truncation, leading to more accurate answers.
Example: If a policy section is 3000 characters, we create one parent (the full section) and 3-4 child chunks. When a query matches child #2, we send the entire parent to the LLM for answer generation. For the product-level overview, see LuxonLink internal AI knowledge base software.
Metadata & Citations
Every Chunk Has Metadata
Each indexed chunk carries metadata that enables accurate citations:
source_file: Original document name (e.g., "HR_Handbook_2026.pdf")source_path: Department folder (e.g., "HR/Policies/")page_number: Exact page in PDF (if applicable)section_title: Heading or context (if detected)doc_hash: File hash for tracking versionsindexed_at: Timestamp of indexing
How Citations Work
When LuxonLink generates an answer:
- Vector search retrieves top-N matching chunks with their metadata.
- The LLM generates an answer using the chunk content.
- We extract the metadata and format it as a clickable citation.
- Users click the citation and see the exact source (PDF page, DOCX section, etc.).
No guessing. Every answer is traceable to its source document and location.
Example: A user asks "What's our PTO policy?" LuxonLink returns: "You accrue 15 days annually" with a citation: HR_Handbook_2026.pdf, page 12. Clicking opens the PDF to page 12.
Conditional OCR
Why Not OCR Everything?
Running OCR on every document is slow and wastes resources. Most PDFs already contain searchable text.
LuxonLink's OCR Logic
- When a PDF is ingested, we extract text directly.
- If the extracted text has fewer than ~50 words, we flag it as likely scanned.
- Only then do we apply OCR to extract text from images.
- The extracted text is indexed like any other content.
Result: Fast indexing for most documents, accurate text extraction when needed.
Example: A signed contract PDF (scanned) is uploaded. LuxonLink detects <10 words, runs OCR, extracts the contract text, and indexes it. A native-text policy PDF skips OCR entirely.
Updates, Deletions & Re-Indexing
How We Track Changes
LuxonLink monitors your document sources (file shares, cloud storage) for changes:
- File added: New file is indexed automatically.
- File modified: We detect the change via file hash, delete old chunks, re-index the new version.
- File deleted: All chunks from that document are removed from the index.
- File moved: Treated as a delete + add (metadata updated).
No manual refresh required. Your knowledge base stays current automatically.
Version Control
When a document is updated, we:
- Store the new version with a new
doc_hash. - Mark old chunks as outdated (not deleted immediately for audit purposes).
- Queries always match against the latest version.
Result: Answers reflect current policy, not outdated documents.
Example: HR updates the PTO policy from 10 days to 15 days. LuxonLink detects the file change, removes old chunks, indexes the new version. Future queries now return "15 days".
Security & Privacy
No Training on Your Data
xillix uses OpenAI's embedding and generation APIs. Your documents are processed to generate embeddings and answers, but OpenAI does not train models on customer data when using their API (per their terms). xillix manages key access as part of the deployment.
We store encrypted copies for indexing and retrieval. When synced, original files remain in your source storage as the source of truth.
Department Isolation
Each LuxonLink bot is scoped to a single department:
- HR bot sees only HR documents.
- IT bot sees only IT documents.
- Sales bot sees only Sales documents.
Role-based access control (RBAC) ensures users can only query bots they have permission to access. This is enforced at the application layer before any query reaches the vector database.
Audit Trails
Every query is logged with:
- User ID
- Timestamp
- Query text
- Retrieved chunks (for compliance review)
- Answer generated
This allows compliance teams to review what information was accessed and by whom.
Summary: Your data stays yours. No training, no leakage across departments, full audit logs.
Hosting & Infrastructure
Fully Managed by xillix
xillix is a SaaS platform. xillix handles:
- Infrastructure provisioning and scaling
- Database backups and disaster recovery
- Security patching and updates
- Monitoring and uptime
Customers do not deploy code or manage infrastructure. You access LuxonLink via web browser.
Enterprise: Dedicated Instances
For customers with strict data isolation requirements, xillix provides dedicated instances:
- Single-tenant infrastructure (your data never shares compute/storage with other customers)
- Still hosted and managed by xillix
- Custom security controls (enhanced encryption, network isolation, etc.)
You still do not deploy or manage code. xillix operates your dedicated instance on your behalf.
Bottom line: xillix is SaaS. You don't deploy containers, configure servers, or manage databases. We do that.