OmniWhale-RAG | Soham Deshmukh

Overview

OmniWhale-RAG is a cutting-edge multimodal RAG (Retrieval-Augmented Generation) system designed to run efficiently on Intel Lunar Lake hardware while maintaining scalability to enterprise server clusters. Built with a "Write Once, Run Anywhere" philosophy, it seamlessly handles both text and visual data in a unified vector space.

Key Features

Multimodal Intelligence: Process PDFs, images, charts, and text in unified vector space
Local-First Architecture: Runs entirely offline on Intel Lunar Lake with NPU/GPU acceleration
Scalable Design: Seamlessly transition from Milvus Lite to Milvus Cluster without code changes
Hardware Optimized: Leverages Intel Arc 130V GPU and NPU for efficient AI inference
Modern Stack: FastAPI backend + Next.js 14 frontend with TypeScript and ShadCN UI
Privacy-Focused: All processing happens locally - your data never leaves your machine

Technical Implementation

Backend: FastAPI, LlamaIndex, Ollama, Milvus, Nomic-Embed | Frontend: Next.js 14, TypeScript, ShadCN UI, Tailwind CSS | Hardware: Intel Core Ultra 5 (Lunar Lake), Arc 130V Graphics, NPU acceleration, 4-bit Quantization

Impact

Enables organizations to process multimodal documents locally with enterprise-grade scalability. Perfect for healthcare, legal, and research environments requiring privacy-first AI solutions with hardware-optimized performance.