CanIRun.ai Finally Answers the Question Every Local AI Enthusiast Has Been Googling for Two Years

CanIRun.ai Finally Answers the Question Every Local AI Enthusiast Has Been Googling for Two Years

There is a ritual that every person interested in running AI locally has performed at least once. You read about some exciting new open-source model — maybe it is Llama 3.3 70B or DeepSeek R1 — and your first thought is not "what can it do?" but "can my laptop handle this without catching fire?"

What follows is a predictable forty-five minute journey through Reddit threads, GitHub README files, and increasingly desperate Google searches like "RTX 3060 run 70B model quantized" and "how much RAM for DeepSeek R1 actually." You piece together conflicting information from seventeen sources, do some napkin math that you are only 60 percent confident in, and either commit to a download that takes three hours or give up entirely.

CanIRun.ai, which hit the front page of Hacker News this week with 762 upvotes, is here to replace that entire ritual with a single web page.

What It Actually Does

The concept is beautifully simple. You visit canirun.ai, browse a catalog of popular open-source AI models, and the site tells you whether your machine can run each one. It factors in VRAM requirements, RAM needs, context window sizes, and quantization options to give you a straightforward answer.

The model catalog is impressively comprehensive. At the time I checked, it covers everything from tiny models like Qwen 3.5 0.8B (which needs just 0.5 GB and could probably run on a smart toaster) all the way up to monsters like Kimi K2, a 1-trillion-parameter MoE model that requires 512 GB of storage. That is not a typo. Five hundred and twelve gigabytes.

Hardware specifications and AI model compatibility checker interface

The Model Lineup — And What Caught My Eye

My colleague Tom — who has spent the last four months building what he calls an "AI homelab" out of a used workstation he bought for $1,200 on eBay — went through the catalog with me over coffee. A few things stood out to both of us.

The sweet spot for most enthusiasts appears to be the 7B to 32B range. Models like Llama 3.1 8B (4.1 GB, 128K context), Phi-4 14B (7.2 GB, 16K context), and Qwen 3 32B (16.4 GB, 128K context) represent the realistic territory for anyone with a decent GPU. Tom runs a Qwen 2.5 Coder 32B on his dual RTX 3090 setup, and he confirmed the 16.4 GB VRAM estimate is accurate — it fits in a single 3090 with Q4 quantization.

What I found particularly useful is that CanIRun.ai lists the quantized sizes rather than raw parameter counts alone. This matters because a 70B model in FP16 needs around 140 GB of VRAM (good luck), but in Q4 quantization, it drops to roughly 35.9 GB — suddenly within reach of a consumer workstation with two high-end GPUs or a single professional card.

The Models You Probably Should Not Try to Run

The catalog includes some entries that serve more as aspirational targets than practical suggestions. DeepSeek R1 at 671B parameters needs 343.7 GB of storage in its quantized form. DeepSeek V3.2 at 685B needs 350.9 GB. And the aforementioned Kimi K2 at 1 trillion parameters requires you to basically turn your home into a small data center.

Sandra — who manages our team budget and has vetoed more hardware purchases than I care to count — looked at the Kimi K2 requirements and said, and I quote: "So you need approximately eleven of those expensive GPUs you keep asking me for." She is not wrong. At current RTX 4090 prices, you are looking at roughly $17,600 in GPU costs alone, before you even think about the motherboard, CPU, power supply, and cooling required to actually run the thing.

That said, the inclusion of these models is not pointless. It gives people a reference point. When someone asks "why does this API cost $0.15 per thousand tokens," you can point them at the hardware required to run the model locally and the economics suddenly make more sense.

New Entries Worth Watching

A few recent additions to the catalog deserve attention. GPT-OSS 20B (10.8 GB) is OpenAI first open-weight MoE model, released about seven months ago, and features configurable reasoning — meaning you can dial the thinking depth up or down depending on your task. The 120B version (59.9 GB) scored 52.6% on SWE-bench, which is genuinely competitive.

Devstral 2 123B from Mistral AI is a dense (not MoE) coding model that scored 72.2% on SWE-bench Verified — the highest I have seen from an open model. At 63 GB quantized, it is a stretch for consumer hardware, but teams pooling GPU resources could make it work.

And then there is Qwen 3.5, which dropped just a month ago in sizes ranging from 0.8B to 9B. The 0.8B variant at 0.5 GB is genuinely exciting for embedded and edge use cases — think Raspberry Pi, IoT devices, or offline mobile applications.

What CanIRun.ai Gets Right

The strength of this tool is in what it does not try to do. It does not benchmark your system. It does not try to install anything. It does not ask you to create an account or hand over your email. It is just a clean, fast catalog that maps hardware requirements to models and gives you a binary answer.

The information architecture is solid too. Each model card shows the provider, parameter count, VRAM requirement, context window, and a brief description. You can quickly scan the page and mentally filter for models that fit your hardware. No clicking through five pages of marketing copy to find the actual specs — they are right there.

Derek, who has been building a local AI chatbot for his elderly mother (the reasoning being that she keeps accidentally subscribing to things when she uses web-based AI tools), found the site immediately useful. "I spent two weeks figuring out which model could run on the M1 MacBook Air I gave her," he told me. "This would have taken me five minutes."

What Could Be Better

My main criticism is the lack of hardware input. Currently, you browse the catalog and mentally match requirements against what you know about your machine. A feature that lets you input your GPU model, VRAM, and RAM — then filters the catalog to show only compatible models — would transform this from a useful reference into an essential tool.

I would also like to see performance estimates. Knowing that a model can run on my hardware is step one. Knowing whether it will produce tokens at 15 per second or 150 per second is step two, and often the deciding factor between a pleasant experience and something that makes you question your life choices while waiting for a response.

Finally, the catalog currently focuses on text-generation models. As multimodal local AI gains traction (Qwen 3.5 is explicitly listed as multimodal), including VRAM overhead estimates for vision processing would be a welcome addition.

The Bigger Picture

CanIRun.ai exists because running AI locally has gone from a niche hobby to a legitimate consideration for privacy-conscious individuals, regulated industries, and anyone who has received one too many "we updated our terms of service" emails from cloud AI providers. The question "can I run this locally?" is now being asked by sysadmins, CTOs, hobbyists, and apparently Derek mother.

This tool makes that question easier to answer. It is not perfect — it lacks system detection, performance estimates, and hardware filtering. But it is free, fast, and comprehensive enough to replace those forty-five-minute Reddit spelunking sessions. For version one, that is a solid foundation.

You can try it at canirun.ai — no signup required, no download necessary. Just your browser, your curiosity, and maybe a healthy dose of GPU envy when you see what the 1T models require.

Once you know what you can run, check out the best AI code assistants for 2026 and why AI agents are not just chatbots anymore.

If you found this useful, check out these related articles:

Found this helpful?

Subscribe to our newsletter for more in-depth reviews and comparisons delivered to your inbox.