Reference Architecture: Private Research Assistant¶
A single Apple Silicon Mac setup for privacy-first research and analysis.
Overview¶
This architecture runs entirely on a single Mac with Apple Silicon, keeping all data local. The privacy router ensures no information leaves the machine. Ideal for processing sensitive documents, internal research, or regulated data analysis.
Hardware Requirements¶
| Component | Minimum | Recommended |
|---|---|---|
| Mac | M1 Pro, 16 GB | M2 Pro/Max, 32 GB+ |
| Storage | 20 GB free | 50 GB free (for models) |
| macOS | 14.0+ | 15.0+ |
Architecture¶
┌────────────────────────────────┐
│ macOS (Apple Silicon) │
│ │
│ ┌──────────────────────────┐ │
│ │ harombe chat │ │
│ │ Privacy mode: local-only│ │
│ └──────────┬───────────────┘ │
│ │ │
│ ┌──────────▼───────────────┐ │
│ │ Ollama (Metal GPU) │ │
│ │ llama3.1:8b │ │
│ │ nomic-embed-text │ │
│ └──────────────────────────┘ │
│ │
│ ┌──────────────────────────┐ │
│ │ ChromaDB (local) │ │
│ │ Semantic memory + RAG │ │
│ └──────────────────────────┘ │
│ │
│ ┌──────────────────────────┐ │
│ │ SQLite (WAL mode) │ │
│ │ Conversations + Audit │ │
│ └──────────────────────────┘ │
└────────────────────────────────┘
Setup¶
1. Install Ollama and Models¶
# Install Ollama
brew install ollama
# Pull models
ollama pull llama3.1:8b
ollama pull nomic-embed-text
2. Install Harombe¶
3. Configure¶
# harombe.yaml
model:
name: llama3.1:8b
temperature: 0.3
ollama:
host: http://localhost:11434
privacy:
mode: local-only
pii_detection: true
sensitivity_threshold: 0.3
memory:
enabled: true
backend: sqlite
embedding:
model: nomic-embed-text
provider: ollama
tools:
shell: true
filesystem: true
web_search: false # No network access
confirm_dangerous: true
voice:
enabled: true
stt:
model: tiny # Fast, runs on CPU
tts:
engine: piper
model: en_US-lessac-medium
agent:
system_prompt: |
You are a private research assistant. All processing happens locally.
Never suggest uploading data to external services.
When analyzing documents, summarize key findings and cite sources.
max_steps: 15
Usage¶
# Interactive research session
harombe chat
# With voice (push-to-talk)
harombe chat --voice
# Example prompts
You> Summarize the key findings from quarterly-report.pdf
You> Search my previous conversations about project X
You> Analyze the CSV data in sales/ and find trends
Key Features¶
- Zero network calls: Privacy mode
local-onlyensures nothing leaves the machine - PII detection: Catches and warns about personally identifiable information
- Semantic memory: Past conversations are searchable via RAG
- Voice interface: Whisper STT + Piper TTS, fully local
- Audit trail: All interactions logged in local SQLite
Performance Tips¶
- Use
llama3.1:8bfor a good speed/quality balance on 16 GB machines - On 32 GB+ machines, upgrade to
llama3.1:70b-q4_0for better reasoning - Keep ChromaDB collection sizes under 100K documents for fast retrieval
- Use
tinyWhisper model for real-time voice;smallfor better accuracy