Local Chat AI
Privacy-first AI chatbot running entirely on local hardware. It utilizes a custom database for specialized, context-aware knowledge.
Overview
This project is a privacy-first AI chatbot running entirely on local hardware, currently utilizing the Qwen 2.5 32B model. It offers complete control over conversations alongside seamless document integration for highly personalized, context-aware responses. I originally built this to serve as an uncensored assistant for Capture The Flag (CTF) practice. By feeding it comprehensive writeups and step-by-step methodologies from completed boxes, the AI builds a custom knowledge base, allowing it to reproduce previous attack vectors and guide me through future challenges.
Key Features
- Local-First: All processing happens on local hardware; no cloud services required.
- Document Integration: RAG (Retrieval-Augmented Generation) for context-aware responses.
- Docker Containerized: Easy deployment and management.
- Multiple Models: Support for various LLM models through Ollama.
Technology Stack
- AnythingLLM: The interface and orchestration layer
- Ollama: Local LLM runtime
- Docker: Containerization for easy deployment
- Tailscale & SSH: Secure networking and tunneling
- RAG: Document embedding for contextual responses
Architecture & Implementation
This project presented a unique networking challenge because I wanted a specific architecture that didn't exhaust my personal server's resources. My solution was a distributed setup: I offloaded the heavy lifting to the university's Hopper server, utilizing its RTX 3090 to run Ollama as the main compute node.
Since I couldn't freely open ports to establish a direct connection between my personal server and the Hopper server, I engineered a workaround. I set up a persistent, background SSH tunnel between the two machines. Because they reside on the same network, this provides a stable and secure communication pipeline.
With the backend established, I run AnythingLLM via Docker on my personal server. I then use Tailscale to route the local web interface to my laptop, seamlessly linking all the devices. To feed the AI my CTF writeups, I set up an rsync pipeline from my laptop to my server. This creates a dedicated directory where I can simply drop files, which are then automatically ingested into AnythingLLM's knowledge base for the AI to reference.