← Back to projects

Local Chat AI

Privacy-first AI chatbot running entirely on local hardware. It utilizes a custom database for specialized, context-aware knowledge.

AnythingLLMOllamaDockerAI

Overview

This project is a privacy-first AI chatbot running entirely on local hardware, currently utilizing the Qwen 2.5 32B model. It offers complete control over conversations alongside seamless document integration for highly personalized, context-aware responses. I originally built this to serve as an uncensored assistant for Capture The Flag (CTF) practice. By feeding it comprehensive writeups and step-by-step methodologies from completed boxes, the AI builds a custom knowledge base, allowing it to reproduce previous attack vectors and guide me through future challenges.

Key Features

  • Local-First: All processing happens on local hardware; no cloud services required.
  • Document Integration: RAG (Retrieval-Augmented Generation) for context-aware responses.
  • Docker Containerized: Easy deployment and management.
  • Multiple Models: Support for various LLM models through Ollama.

Technology Stack

  • AnythingLLM: The interface and orchestration layer
  • Ollama: Local LLM runtime
  • Docker: Containerization for easy deployment
  • Tailscale & SSH: Secure networking and tunneling
  • RAG: Document embedding for contextual responses

Architecture & Implementation

This project presented a unique networking challenge because I wanted a specific architecture that didn't exhaust my personal server's resources. My solution was a distributed setup: I offloaded the heavy lifting to the university's Hopper server, utilizing its RTX 3090 to run Ollama as the main compute node.

Since I couldn't freely open ports to establish a direct connection between my personal server and the Hopper server, I engineered a workaround. I set up a persistent, background SSH tunnel between the two machines. Because they reside on the same network, this provides a stable and secure communication pipeline.

With the backend established, I run AnythingLLM via Docker on my personal server. I then use Tailscale to route the local web interface to my laptop, seamlessly linking all the devices. To feed the AI my CTF writeups, I set up an rsync pipeline from my laptop to my server. This creates a dedicated directory where I can simply drop files, which are then automatically ingested into AnythingLLM's knowledge base for the AI to reference.