
The Fortress
Cloud AI fails. Power grids fail. Your client's deadline doesn't care about either one. So I built Krull AI, a complete offline solution.
The Brief
Krull AI is a self-hosted workstation that runs local language models, offline maps, and Wikipedia on your own hardware, with a filter pipeline that forces every response through reference-checking. This article explains why running a local AI backup is professional practice, grounded in cloud outage data, power grid reliability research, and a Stanford study on retrieval-augmented generation.
- What is Krull AI?
- Krull AI is an open-source, self-hosted AI workstation that bundles local language models, offline maps, Wikipedia, and developer documentation into a single Docker-based stack. It runs on your own hardware with no cloud accounts, no API keys, and no internet required after initial setup.
- Why would I run a local AI model instead of using cloud AI?
- Cloud AI services experience frequent outages. Claude had a major outage on April 7, 2026. ChatGPT was down for 15.5 hours on June 10, 2025. Meanwhile, US power outages jumped 29% between 2018 and 2024. A local backup means your work continues when any of those services fail.
- Does Krull AI work with Claude Code skills and workflows?
- Yes. Krull plugs in at the model layer through a LiteLLM gateway, so your existing Claude Code skills, hooks, CLAUDE.md files, and plan mode all keep working unchanged. A krull-claude launcher command handles the configuration automatically.
- Can Krull AI run on Apple Silicon Macs?
- Alpha macOS support just shipped. Because Docker Desktop cannot pass Apple's Metal GPU to containers, Krull detects macOS and points its services at a native Ollama installation running on the host, giving full Metal acceleration. Mac users are invited to test and report issues.
- How does Krull AI reduce hallucinations in local models?
- Krull runs a filter pipeline on every request that injects Wikipedia articles, developer documentation, web search results, and anti-hallucination rules before the model sees the prompt. A Stanford study found that even commercial RAG-based legal tools hallucinate 17-33% of the time, but that's measurably better than ungrounded models, and the referenced sources make errors verifiable.
Claude Code went down on April 7th.
Anthropic's status page listed it as a "major outage." Thousands of users hit DownDetector. The fix took about ninety minutes.1 If you didn't notice, it's because you weren't the one with a client deliverable due that morning.
I noticed. I've been noticing for a while.
The Drive to Seaside
I live in the Pacific Northwest. It is beautiful here in ways that rearrange your priorities. It also loses power and internet with a regularity that would unsettle anyone whose income depends on a stable connection.
Last year, during a storm, I was in the middle of a major website rollout for a client. The power went out. The internet followed. I drove three towns over to Seaside, Oregon, rented an overlook on the beach, and finished the work from there. The rental was expensive. I lost money on the deal.
But I didn't lose face. And by "face" I don't mean mine. I mean my client's. They never had to explain to their stakeholders why the update didn't ship. They never had to have that conversation at all. That's the job. Your failures stay invisible to the people you serve.
Three towns over. One deadline. The overlook was gorgeous, at least.
The Seaside drive taught me something I should have already known. If your work depends on someone else's infrastructure, your client's face depends on it too.
The Numbers Got Worse
I pulled up the data expecting to find a regional problem. I found a national one.
Oak Ridge National Laboratory published an analysis in March. Major US power outages jumped 29% between 2018 and 2024, and the total cost to customers hit $121 billion in 2024 alone.2
That's just the power grid. Cisco's ThousandEyes reported that US internet outages jumped 284% between November and December of 2025.3
And the cloud AI services themselves? ChatGPT went dark for 15.5 hours on June 10, 2025. OpenAI's own help center published instructions for how to request an Apple App Store refund.4 Claude has had multiple incidents in recent weeks, including the April 7 outage that's still fresh.
The infrastructure we depend on is getting less reliable, not more. The power grid and the ISP fail on their own schedules. The AI providers fail on theirs. Your client's deadline doesn't negotiate with any of them.
What If the Backup Is Better?
Here's where the story takes a turn I didn't expect.
I'd been thinking about a local AI stack as a spare tire. Something you're glad to have but hope you never use. Then I read a Stanford study that complicated the whole frame.
Researchers at Stanford's RegLab tested the commercial AI legal tools that LexisNexis and Thomson Reuters market as "hallucination-free." They found that those tools hallucinate between 17% and 33% of the time.5 Somewhere between "hallucination-free" and "wrong a third of the time," a press release went unretracted. Retrieval-augmented generation, the technique of feeding a model verified reference material before it answers, measurably reduces errors compared to running a model on pure memory. But "reduced" is not "eliminated."
The interesting part isn't the failure rate. It's the mechanism. When a model is forced to check a source before answering, you can see what it was given and evaluate whether the answer tracks. You can verify the work. A frontier cloud model answering from what it memorized during training gives you no such handle. It sounds confident. It might be wrong. And you can't tell the difference without doing the research yourself.
That reframed the whole project. A local model forced to check its references isn't the second-best version of cloud AI. For verifiable, defensible work, it might be the better tool.
Krull
So I built one. Krull AI is an open-source, self-hosted workstation that runs entirely on your hardware. One command to start. Three doors on the homepage: an AI chat running local language models, offline maps with NOAA nautical and FAA aeronautical charts, and a knowledge base serving Wikipedia, Stack Exchange, developer documentation, and Project Gutenberg from local ZIM files.
The part I'm most particular about is the filter pipeline. Every request, whether it comes from the browser chat or from Claude Code, passes through the same set of filters before the model sees it. Truth Guard injects anti-hallucination rules. Kiwix Lookup pulls relevant articles from the offline knowledge base. Auto Web Search queries a local SearXNG instance for current results. Context Manager auto-compacts when the conversation gets long. The model never answers from vibes alone. It always has references in hand.
Your existing Claude Code skills, hooks, plan mode, and CLAUDE.md files all keep working. Krull plugs in at the model layer. The harness stays yours.
Wikipedia, Stack Exchange, and Project Gutenberg. On the shelf. Not on a server you don't own.
I wrote previously about the island problem, how AI without internet connection gets stranded. Krull is the answer I was circling. The filter pipeline keeps the model grounded even when the network is down, because the references are already on the machine.
The Mac Question
Then I tried to make it work on a Mac.
The M-series Macs have enough unified memory to run a serious model. But Docker Desktop on macOS runs containers in a Linux VM. Apple's Virtualization.framework doesn't expose the GPU to those containers. So containerized Ollama on a Mac is CPU-only, which is useless for anything beyond a toy model.
Apple Silicon GPUs, Docker, and Ollama. Pick two. Krull picked the two that work together. The latest update detects macOS and skips the containerized Ollama entirely. Instead, it points everything at a native Ollama installation on the host. Full Metal acceleration.
It works on paper. I haven't tested it on real Mac hardware yet, and I'm not going to pretend otherwise. If you own an M-series Mac, clone the repo, run it, and tell me what breaks. The hardware panel won't detect your GPU yet. The model pull flow might surprise you. I need Mac users to find the edges I can't see from a Linux workstation.
Because a backup you haven't tested is not a backup. That's the whole point of a fortress. Help me test this one.
References
Footnotes
-
Tweedie, S. and Griffiths, B. (2026). "Claude Suffered a 'Major Outage.' Anthropic Says It's Fixed." Business Insider ↩
-
Bhusal, N., et al. (2026). "Analysis shows power outages cost US electricity customers billions." Oak Ridge National Laboratory ↩
-
Cisco ThousandEyes. (2026). "Looking Ahead: 2026's Biggest Outage Risks." ThousandEyes Internet Report ↩
-
OpenAI. (2025). "June 10th Service Disruption FAQ." OpenAI Help Center ↩
-
Magesh, V., et al. (2025). "Hallucination-Free? Assessing the Reliability of Leading AI Legal Research Tools." Journal of Empirical Legal Studies. Stanford RegLab ↩
More to Explore

The Dose
Your developer is deciding how much to tell you about AI risk. That's not dishonest. It's rational. But the questions you're not asking are the ones that matter most.

The Trojan Combine
The 2026 Farm Bill pays farmers 90 cents on the dollar to adopt AI they didn't ask for, to replace workers they didn't choose to lose. The conservation budget got cut to fund it.

No Brakes
Advertisers guessing at algorithms. Developers shipping code they can't read. AI researchers watching models they can't explain. The black box keeps getting bigger.
Browse the Archive
Explore all articles by date, filter by category, or search for specific topics.
Open Field Journal