Self-Hosted AI Assistant: Complete Setup Guide
Self-hosting an AI assistant sounds straightforward until you're three hours deep in CUDA drivers at 11pm, wondering why your model only answers in Portuguese. I've been there. This guide covers both paths honestly: full DIY with Ollama, and managed self-hosting where someone else handles the infrastructure.
Neither path is right for everyone. By the end of this you'll know exactly which one fits your situation.
Why bother self-hosting at all?
Before diving into the how, it's worth being clear on the why. Self-hosted AI has two real advantages over services like ChatGPT:
- Your data never leaves your infrastructure. Every token you process stays on hardware you control. No third-party server logs, no training data opt-outs to manage, no wondering what happened to that email you pasted in.
- No per-query costs. Once you have the server running, inference is free. Heavy users — people running hundreds of prompts per day — often save money compared to API pricing.
The tradeoff is setup complexity, ongoing maintenance, and the fact that consumer-grade hardware runs smaller models. A local Mistral-7B is genuinely good, but it's not GPT-4. That gap is closing fast, but it's still real.
Path 1: Full DIY with Ollama
Ollama is the best tool currently available for running LLMs locally. It handles model downloads, quantization, and an OpenAI-compatible API endpoint. Here's the full setup process.
Step 1: Install Ollama
On macOS or Linux, installation is a single command:
On Windows, download the installer from ollama.com. After installation, Ollama runs as a background service and exposes a local API at http://localhost:11434.
Step 2: Pull a model
Model selection is the most consequential decision. Here's how to think about it:
- 8B parameter models (Llama 3, Mistral 7B) — Run well on a modern laptop with 16GB RAM. Fast, capable for most tasks. Pull with:
ollama pull llama3 - 13B–34B models — Need 32GB+ RAM or a dedicated GPU. Noticeably better reasoning. Good for complex analysis or code generation.
- 70B+ models — Need a beefy GPU (A100/H100 territory) or very fast RAM. These match or exceed GPT-4 on many benchmarks. Not practical for most home setups.
For most users starting out: llama3:8b for general use, or mistral:7b-instruct if you want something tuned for following instructions.
Step 3: Test the API
Once a model is pulled, verify it's working:
You should get a JSON response with a response field. If you do, your local LLM is running.
Step 4: Connect a frontend
Ollama's API is OpenAI-compatible, which means most AI tools work with it. Options range from Open WebUI (a full ChatGPT-like web interface) to building your own Telegram bot (covered in the next post). For a quick test, Open WebUI is the fastest path:
Visit http://localhost:3000 and you have a working AI chat interface backed by your local model.
Realistic setup time: If everything goes smoothly — no CUDA issues, no model download interruptions, no Docker network problems — plan for 3–5 hours for a first-time setup. Experienced Linux users can do it in under an hour. Windows users often hit driver issues that add time. Budget a full evening.
The ongoing maintenance reality
Setup is one-time; maintenance is forever. Running your own AI infrastructure means:
- Keeping Ollama updated as new versions release
- Managing disk space as models accumulate (7B models are ~4GB each)
- Handling OS updates that occasionally break GPU drivers
- Ensuring the service restarts automatically after reboots
- Monitoring for crashes or performance degradation
None of this is insurmountable. But it's real work, and it compounds over time.
Path 2: Managed self-hosting
Managed self-hosting means the infrastructure is still yours — a real server running under your account, with your data — but someone else handles the setup, maintenance, and operations. This is what GetMyPersonalAI does.
The difference in experience is stark:
- You pay, provide your Telegram username, and get a bot token back within 60 seconds
- The AI runs on a dedicated EC2 instance — not shared infrastructure, not someone else's server — provisioned specifically for you
- Updates, monitoring, and crash recovery happen automatically
- You never touch the command line
The tradeoff is control. With full DIY, you choose the exact model, configuration, and can modify anything. With managed self-hosting, you're working within the parameters of what's offered. For most users, that's fine — the out-of-box configuration is well-tuned.
Cost comparison
Let's be concrete about money. These are real numbers, not marketing estimates.
| Setup | Monthly Cost | Setup Time | Maintenance | Data Location |
|---|---|---|---|---|
| ChatGPT Plus | $20/mo | 2 minutes | None | OpenAI servers |
| DIY: local laptop | $0 (hardware you own) | 3–5 hours | 1–2 hrs/month | Your laptop |
| DIY: cloud GPU instance | $40–80/mo (t3.xlarge + GPU) | 5–8 hours | 2–4 hrs/month | Your cloud instance |
| GetMyPersonalAI | $19.99/mo | 60 seconds | None | Your EC2 instance |
The DIY local laptop path is technically cheapest if you already own the hardware. But it only works while your computer is on, and your model quality is limited by your RAM. A 7B model running on a MacBook Pro is good, not great.
The DIY cloud path gives you a 24/7 server with a larger model. But once you factor in an EC2 instance with enough RAM to run a real model — a g4dn.xlarge or similar — you're at $40–80/mo before counting your time. And there's real time involved: initial setup, debugging, monthly maintenance. If your time is worth anything, the cost advantage shrinks fast.
Which path is right for you?
Here's the honest decision framework:
Choose full DIY if:
- You enjoy system administration and find this stuff genuinely interesting
- You need to customize the model configuration deeply (fine-tuning, custom system prompts, specific model architectures)
- You're already running a home server or NAS and this fits naturally into your setup
- You have a spare GPU sitting around doing nothing
Choose managed self-hosting if:
- You want private AI but don't want to become a Linux sysadmin
- You need it running 24/7 without babysitting it
- Your time has real value and you'd rather spend it on things that aren't infrastructure
- You want the data isolation benefits without the complexity cost
The goal of self-hosting isn't self-hosting. The goal is private AI that works reliably. If DIY gets you there, great. If managed self-hosting gets you there with less friction, that's a legitimate choice — and one that costs about the same as a ChatGPT subscription.
Skip the setup. Get private AI in 60 seconds.
GetMyPersonalAI deploys your own EC2-hosted AI assistant — no server setup, no maintenance. Private by architecture, not policy.
Start your $1 trial