Self-Hosting

Self-Hosted AI Assistant: Complete Setup Guide

Apr 2026 · 12 min read · By the GetMyPersonalAI team

Self-hosting an AI assistant sounds straightforward until you're three hours deep in CUDA drivers at 11pm, wondering why your model only answers in Portuguese. I've been there. This guide covers both paths honestly: full DIY with Ollama, and managed self-hosting where someone else handles the infrastructure.

Neither path is right for everyone. By the end of this you'll know exactly which one fits your situation.

Why bother self-hosting at all?

Before diving into the how, it's worth being clear on the why. Self-hosted AI has two real advantages over services like ChatGPT:

Your data never leaves your infrastructure. Every token you process stays on hardware you control. No third-party server logs, no training data opt-outs to manage, no wondering what happened to that email you pasted in.
No per-query costs. Once you have the server running, inference is free. Heavy users — people running hundreds of prompts per day — often save money compared to API pricing.

The tradeoff is setup complexity, ongoing maintenance, and the fact that consumer-grade hardware runs smaller models. A local Mistral-7B is genuinely good, but it's not GPT-4. That gap is closing fast, but it's still real.

Path 1: Full DIY with Ollama

Ollama is the best tool currently available for running LLMs locally. It handles model downloads, quantization, and an OpenAI-compatible API endpoint. Here's the full setup process.

Step 1: Install Ollama

On macOS or Linux, installation is a single command:

curl -fsSL https://ollama.com/install.sh | sh

On Windows, download the installer from ollama.com. After installation, Ollama runs as a background service and exposes a local API at http://localhost:11434.

Step 2: Pull a model

Model selection is the most consequential decision. Here's how to think about it:

8B parameter models (Llama 3, Mistral 7B) — Run well on a modern laptop with 16GB RAM. Fast, capable for most tasks. Pull with: ollama pull llama3
13B–34B models — Need 32GB+ RAM or a dedicated GPU. Noticeably better reasoning. Good for complex analysis or code generation.
70B+ models — Need a beefy GPU (A100/H100 territory) or very fast RAM. These match or exceed GPT-4 on many benchmarks. Not practical for most home setups.

For most users starting out: llama3:8b for general use, or mistral:7b-instruct if you want something tuned for following instructions.

Step 3: Test the API

Once a model is pulled, verify it's working:

curl http://localhost:11434/api/generate \
  -d '{"model": "llama3", "prompt": "What is 2+2?", "stream": false}'

You should get a JSON response with a response field. If you do, your local LLM is running.

Step 4: Connect a frontend

Ollama's API is OpenAI-compatible, which means most AI tools work with it. Options range from Open WebUI (a full ChatGPT-like web interface) to building your own Telegram bot (covered in the next post). For a quick test, Open WebUI is the fastest path:

docker run -d -p 3000:8080 \
  --add-host=host.docker.internal:host-gateway \
  -e OLLAMA_BASE_URL=http://host.docker.internal:11434 \
  ghcr.io/open-webui/open-webui:main

Visit http://localhost:3000 and you have a working AI chat interface backed by your local model.

Realistic setup time: If everything goes smoothly — no CUDA issues, no model download interruptions, no Docker network problems — plan for 3–5 hours for a first-time setup. Experienced Linux users can do it in under an hour. Windows users often hit driver issues that add time. Budget a full evening.

The ongoing maintenance reality

Setup is one-time; maintenance is forever. Running your own AI infrastructure means:

Keeping Ollama updated as new versions release
Managing disk space as models accumulate (7B models are ~4GB each)
Handling OS updates that occasionally break GPU drivers
Ensuring the service restarts automatically after reboots
Monitoring for crashes or performance degradation

None of this is insurmountable. But it's real work, and it compounds over time.

Path 2: Managed self-hosting

Managed self-hosting means the infrastructure is still yours — a real server running under your account, with your data — but someone else handles the setup, maintenance, and operations. This is what GetMyPersonalAI does.

The difference in experience is stark:

You pay, provide your Telegram username, and get a bot token back within 60 seconds
The AI runs on a dedicated EC2 instance — not shared infrastructure, not someone else's server — provisioned specifically for you
Updates, monitoring, and crash recovery happen automatically
You never touch the command line

The tradeoff is control. With full DIY, you choose the exact model, configuration, and can modify anything. With managed self-hosting, you're working within the parameters of what's offered. For most users, that's fine — the out-of-box configuration is well-tuned.

Cost comparison

Let's be concrete about money. These are real numbers, not marketing estimates.

Setup	Monthly Cost	Setup Time	Maintenance	Data Location
ChatGPT Plus	$20/mo	2 minutes	None	OpenAI servers
DIY: local laptop	$0 (hardware you own)	3–5 hours	1–2 hrs/month	Your laptop
DIY: cloud GPU instance	$40–80/mo (t3.xlarge + GPU)	5–8 hours	2–4 hrs/month	Your cloud instance
GetMyPersonalAI	$19.99/mo	60 seconds	None	Your EC2 instance

The DIY local laptop path is technically cheapest if you already own the hardware. But it only works while your computer is on, and your model quality is limited by your RAM. A 7B model running on a MacBook Pro is good, not great.

The DIY cloud path gives you a 24/7 server with a larger model. But once you factor in an EC2 instance with enough RAM to run a real model — a g4dn.xlarge or similar — you're at $40–80/mo before counting your time. And there's real time involved: initial setup, debugging, monthly maintenance. If your time is worth anything, the cost advantage shrinks fast.

Which path is right for you?

Here's the honest decision framework:

Choose full DIY if:

You enjoy system administration and find this stuff genuinely interesting
You need to customize the model configuration deeply (fine-tuning, custom system prompts, specific model architectures)
You're already running a home server or NAS and this fits naturally into your setup
You have a spare GPU sitting around doing nothing

Choose managed self-hosting if:

You want private AI but don't want to become a Linux sysadmin
You need it running 24/7 without babysitting it
Your time has real value and you'd rather spend it on things that aren't infrastructure
You want the data isolation benefits without the complexity cost

The goal of self-hosting isn't self-hosting. The goal is private AI that works reliably. If DIY gets you there, great. If managed self-hosting gets you there with less friction, that's a legitimate choice — and one that costs about the same as a ChatGPT subscription.

Skip the setup. Get private AI in 60 seconds.

GetMyPersonalAI deploys your own EC2-hosted AI assistant — no server setup, no maintenance. Private by architecture, not policy.

Start your $1 trial

Self-Hosted AI Assistant: Complete Setup Guide

Why bother self-hosting at all?

Path 1: Full DIY with Ollama

Step 1: Install Ollama

Step 2: Pull a model

Step 3: Test the API

Step 4: Connect a frontend

The ongoing maintenance reality

Path 2: Managed self-hosting

Cost comparison

Which path is right for you?

Choose full DIY if:

Choose managed self-hosting if:

Skip the setup. Get private AI in 60 seconds.

More from the blog