Problem
Most AI chat tools depend on hosted APIs. This project runs chat, persistence, and inference entirely on localhost with a one-command dev stack.
Frontend work
- TanStack Start chat UI with session sidebar, SSE streaming, and optimistic messages
- Three-column layout with mobile drawers and react-virtual for long transcripts
- OpenAPI codegen into shared TypeScript contracts
- Roleplay template management route with CRUD forms
Backend & system work
- FastAPI layered routes, services, repositories, and Alembic migrations
- PostgreSQL + pgvector for chat and embedding metadata; MinIO for image binaries
- Ollama client with model allowlist and optional CrewAI orchestration backends
- Docker Compose stack: Ollama, PostgreSQL, MinIO, API, and model pull jobs
Key decisions
- Docker Compose brings up the full backend stack with one `bun run local:start` command
- Shared `@local/shared` package keeps frontend and backend request shapes aligned
- SSE streaming with optimistic UI instead of waiting for full completion responses
Challenges
- SSE client parsing with optimistic pending/streaming/failed message states
- Orchestration trace storage separate from user-visible chat messages
- Local-only security tradeoffs documented (no auth in default configuration)
Tech stack
Frontend
- TanStack Start
- React 19
- TanStack Router
- TanStack Query
- Vite
- Tailwind CSS
Backend
- FastAPI
- SQLAlchemy
- Alembic
- Uvicorn
Data
- PostgreSQL
- pgvector
- MinIO
Infrastructure
- Docker Compose
- Bun workspaces
Testing
- Vitest
- Playwright
- pytest
AI / ML
- Ollama
- CrewAI (optional)
Links