INTELLIPAL, Hybrid RAG System

Hybrid AI chatbot delivering reliable protocol access anytime, with or without connectivity.

                                                                In Collaboration with:

Introducing INTELLIPAL, Hybrid RAG System

INTELLIPAL is a RAG-based intelligence application that empowers Officers Operating on the Ground by decentralizing information access, with or without the internet. By augmenting slow legacy systems with a unified retrieval interface, it reduces reliance on supervisor consultations or key-word search approaches for routine protocol verification. This transition to a natural language-based and high-speed application minimizes operational friction and accelerates frontline decision-making, transforming traditional information search workflows into a more agile and effective digital system.

Team members

Lee Jun Hui Ryan (ISTD), Cai Junjie (ISTD), Gay Shin Lee (ISTD), Ho Xiaoyang (ISTD), Luvena Liethanti (ESD), Shah Pankti Amish (ISTD), Sun Sitong (ISTD)

Instructors:

  • Fredy Tantri

Writing Instructors:

  • Bernard Tan

  • Susan Wong

Imagine yourself as a rookie officer..

Let's Identify the Problem Statement

Ground Response Force officers (GRF) currently lack a reliable method to access critical Standard Operating Procedures (SOPs) and legal reference materials while deployed in the field.

The existing knowledge base is hindered by keyword-dependent search mechanisms and a reliance on stable internet connectivity, which is often unavailable in operational “blackspots”.

Introducing..

Take a Closer Look on Our Product

System Architecture

1
Indexing Pipeline
Large Language Models (LLMs) can produce outdated or hallucinated outputs due to overreliance on parametric memory; Retrieval-Augmented Generation (RAG) mitigates this by grounding responses in retrieved, domain-specific documents, improving trustworthiness and traceability. Documents are split into semantically coherent chunks and embedded for effective retrieval, while a KNN-clustering-based partial loading approach reduces RAM usage for resource-constrained mobile environments.
2
Edge Device
Our edge system uses a C++ interoperability layer to connect mobile environments with a low-latency llama.cpp engine, combining a unified schema and C++-compatible vector database to enable optimized dot-product similarity search for millisecond-scale retrieval. It employs hybrid offline-online orchestration (Online inference API + local inference engine) for reliability in all environments, and is built on a model-agnostic, database-agnostic architecture for scalable, future-proof integration of new LLMs and vector stores.
3
UI/UX
This project applied a human-centred design (HCD) process grounded in Don Norman’s principles of discoverability, feedback, and affordance. It began with interviews with SPF personnel to identify friction points in SOP retrieval, followed by iterative design and A/B testing to evaluate search affordances, result hierarchy, and navigation CTAs against task-completion metrics. Insights informed low-fidelity wireframes for early validation, before progressing to a high-fidelity prototype aligned with SGDS v2 design tokens, WCAG 2.1 AA accessibility standards, and Samsung Galaxy S20 One UI constraints.

Product Evaluation

96% ↓

Time to Reach Target Content

6.3s → 0.9s

Time to Retrieve Confidence Score

36x

System Throughput Increase

Our evaluation proves that the system is fast, reliable, and ready for frontline use after several technical improvements. We rigorously tested the prototype using four specific metrics: Faithfulness, Answer Relevance, Context Precision, and Context Recall. The results show that our search pipeline is highly accurate, consistently finding the right documents without making up false information.

We learned that while shrinking the model and compressing the text speeds things up, it can hurt the factual correctness of the final answers. To solve this, we found that using 4-bit or 8-bit models with strict noise filtering provides the best balance between speed and truth. Through four major updates, we massively improved the overall generation speed, jumping from 2 to 73 tokens per second and dropping the initial wait time to just 1.8 seconds. Combined with faster database search methods like partial loading, the final system successfully delivers instant, trustworthy guidance to officers directly on their devices.

Impacts

Economic

Economic Impact

It optimizes manpower by reducing administrative bottlenecks.

Social

Social Impact

Our system increases public safety through faster response times.

Environment

Environment Impact

It pushes users toward a truly paperless, digital-first frontline.

Users Feedback

Acknowledgements

The huge progress of this Capstone project would not have been possible without the invaluable guidance, support, and collaboration of several key individuals and institutions.  

We extend our sincere gratitude to our SUTD Capstone Mentors, Dr. Fredy Tantri, Prof. Kenny Choo, and Geraldine Quek for their unwavering support, rigorous feedback, and technical direction, which were instrumental in shaping the project’s architecture and methodology.  

Our deepest appreciation goes to our industry partner, HTX. We extend special thanks to Shisheng Huang, Calista Choy, Prasanth Karthikeyan, and Justin Yeo for providing us with the critical operational insights, access to Ground Response Force (GRF) personnel, and the real-world problem statement that grounded this research. Their commitment to innovation and willingness to collaborate with our team was essential.

Menu

ornament-menu