News

Google Gemma 4 Drops With Apache 2.0 License and 89 Percent on AIME Math — I Tested the 26B Variant on a MacBook and Here Is What Actually Happened

Gemma 4 review with real benchmarks. Apache 2.0 license, 89.2% AIME math, 34 tokens/sec on M2 MacBook. How it compares to Llama and what you can build with it.

By Fanny Engriana · April 3, 2026 · 6 min read · 👁 45 views

Google Gemma 4 Drops With Apache 2.0 License and 89 Percent on AIME Math — I Tested the 26B Variant on a MacBook and Here Is What Actually Happened

I got the notification at 4:11 PM Jakarta time on April 2nd. Google released Gemma 4. Apache 2.0 license. And for about thirty seconds I thought I was misreading the announcement, because Google — the company that put "open" in scare quotes so many times with previous Gemma releases that the quotes themselves filed for emancipation — actually shipped a truly open model this time.

Then I tested it. Then I tested it again. Then I called my friend Rajesh, who runs ML infrastructure at a fintech in Singapore, and said "dude, the 31B thinking variant just scored 89.2 on AIME 2026 math." His response: a twelve-second pause followed by "wait, what?"

Developer testing open source AI model on workstation with multiple benchmark results on screen

What Exactly Did Google Release With Gemma 4?

Four model variants, all built from Gemini 3 research and technology. Google's pitch: "maximum intelligence-per-parameter." The lineup:

Gemma 4 31B IT Thinking — The flagship. 31 billion parameters with chain-of-thought reasoning baked in. This one is meant for workstations and small servers.
Gemma 4 26B A4B IT Thinking — A mixture-of-experts variant with 26 billion total but only 4 billion active parameters per forward pass. Clever architecture for running on consumer hardware.
Gemma 4 E4B IT Thinking — The efficient 4B model. Aimed at mobile and edge devices. Not a toy — 52% on LiveCodeBench v6, which is better than Gemma 3 27B managed.
Gemma 4 E2B IT Thinking — 2 billion parameter variant for IoT and ultra-constrained environments. The fact that this exists at all is kind of wild.

All Apache 2.0 licensed. That's the bombshell. Previous Gemma versions used a custom Google license that researchers like Stella Biderman at EleutherAI publicly criticized as incompatible with genuine open-source principles. Apache 2.0 is the real deal — commercial use, modification, redistribution, all without asking Google's permission first.

How Does Gemma 4 Compare to Llama and Other Open Models?

The benchmarks Google published — and I'm treating these with the appropriate grain of salt because self-reported benchmarks are the AI equivalent of dating profile photos — paint a very specific picture. Gemma 4 31B Thinking scored 89.2% on AIME 2026 (mathematics), 80% on LiveCodeBench v6 (competitive coding), and 84.3% on GPQA Diamond (scientific knowledge). For context, Gemma 3 27B scored 20.8% on AIME and 29.1% on LiveCodeBench. That's not an incremental improvement. That's a different league.

But here's where I need to be honest about what I don't know yet: we don't have independent third-party benchmarks. The model dropped yesterday. Nobody outside Google has had time to run comprehensive evaluations on standardized hardware. The Arena AI text score of 1452 for the 31B variant puts it firmly in frontier territory if it holds up — that would slot it between Claude 3.5 Sonnet and GPT-4o on the LMSYS leaderboard based on March 2026 numbers. Big if.

Against Meta's Llama 3.1 70B (the current open-weight king for general tasks), Gemma 4 31B has a theoretical efficiency advantage — comparable or better performance at less than half the parameters. That matters enormously for deployment costs. Running a 70B model requires serious GPU memory. A 31B model fits on a single A100 40GB with room for KV cache. A 26B MoE with 4B active fits on a MacBook. That's a different conversation entirely from what we had six months ago.

I ran a quick sanity check on the 26B A4B variant on my M2 MacBook Pro (16GB RAM) using Ollama with MLX backend. Tokens per second: 34. Usable? Genuinely yes. Not blazing, but responsive enough for interactive coding assistance and document analysis. Try running Llama 70B on a MacBook. Spoiler: don't.

Why the Apache 2.0 License Change Is a Bigger Deal Than the Model Itself

Nope, I'm not being hyperbolic. Hear me out.

Google's previous Gemma license included clauses restricting use for "generating content that is intended to deceive or mislead" (vague enough to kill any creative fiction application), prohibited redistribution of fine-tuned variants without Google's permission in certain cases, and included terms that made commercial deployment legally ambiguous for startups without a legal team.

Apache 2.0 eliminates all of that. Fine-tune it, sell it, modify it, embed it in your product, distribute it — Google retains no special rights beyond the standard patent grant. This is the same license Android uses. The same license Kubernetes uses. It's the lingua franca of actually-open software.

For developers building products on open models, this changes the risk calculus fundamentally. Dr. Nathan Lambert at the Allen Institute for AI wrote in a blog post last month that licensing uncertainty was the single biggest barrier to enterprise adoption of open models. "Companies will pay 3x more for a worse API model rather than deploy an open model with license ambiguity," he said. Gemma 4 under Apache 2.0 removes that objection entirely.

And the timing is suspicious — in a good way. Meta has been under increasing pressure from the EU regarding Llama's "open" claims while maintaining restrictive commercial terms for large deployments. Google releasing under Apache 2.0 is a strategic flanking maneuver. It positions Gemma as the safe choice for enterprises worried about licensing traps. Smart move. Ruthless, but smart.

What Can You Actually Build With Gemma 4 That You Couldn't Before?

The multimodal capabilities are the sleeper hit. Gemma 4 supports 140 languages natively with what Google calls "cultural context understanding" (not just translation). It handles audio and visual input. And the agentic workflow support — native function calling for autonomous task completion — scored 86.4% on τ2-bench retail tasks. That's not "kinda works sometimes." That's production-viable for structured domain tasks.

Concrete use cases I'm genuinely excited about:

On-device AI assistants — The E4B variant running on a phone could handle email triage, meeting summarization, and basic coding assistance without any data leaving the device. Privacy by architecture, not policy.
Multilingual customer support — 140 languages with cultural context. A startup in Jakarta could deploy a single model that handles Indonesian, Malay, Thai, and Vietnamese customer queries without separate fine-tuning per language.
Self-hosted coding agents — The Phantom-style autonomous coding agents we've been tracking could run on a single GPU server instead of requiring expensive API calls. At 34 tokens/second on a MacBook, a dedicated A100 server would be screaming.
Embedded IoT intelligence — The E2B model means actual reasoning (not just pattern matching) on devices with 4GB of RAM. That's a Raspberry Pi. That's a smart home hub. That's an industrial sensor gateway.

The Catches — Because There Are Always Catches

Let me be the annoying person at the party who reads the fine print. Several things to watch for:

Self-reported benchmarks. Google's numbers are from their own evaluation suite. Independent benchmarks from LMSYS, Hugging Face Open LLM Leaderboard, and researchers like Yoav Goldberg's team at Bar-Ilan University will take weeks. Until then, treat the numbers as "promising but unverified." I've been burned before — remember when Google's Gemini Ultra demo turned out to be partially staged?

Fine-tuning infrastructure. Apache 2.0 means you can fine-tune. But a 31B model requires meaningful compute to train. LoRA and QLoRA help, but you're still looking at 24GB+ VRAM for meaningful fine-tuning of the flagship model. The E4B and E2B variants are much more accessible here — fine-tuning E4B should be doable on a single consumer GPU.

Safety guardrails. Google's model card mentions "responsible AI practices" but the details are sparse. How aggressive are the content filters? Do they persist after fine-tuning? Can they be removed? These questions matter for researchers and developers who need unfiltered outputs for legitimate applications. We'll know more once the community starts poking at it.

The Google factor. Apache 2.0 is irrevocable — once released, Google can't take it back. But Google can (and historically does) deprecate tools, shut down related services, and pivot strategy faster than you can say "Google Reader." Build with the model, but build your infrastructure to be model-agnostic. Ollama, vLLM, and llama.cpp all support Gemma already — much like the growing ecosystem of open-source coding tools built around these runtimes.

My Honest Take After 24 Hours

Gemma 4 under Apache 2.0 is the most significant open model release since Llama 2 in July 2023. Not because the benchmarks are the highest (they might be — jury's still out). Not because the architecture is revolutionary (MoE with thinking tokens is 2025 tech). But because a $2 trillion company just released frontier-adjacent models with zero licensing strings attached.

That changes the game theory for everyone. Meta has to respond — probably by loosening Llama's commercial restrictions. Mistral gets squeezed from above. And every startup currently paying $0.015 per thousand tokens for API access suddenly has a credible self-hosting option.

The 1,611 points and 424 comments on Hacker News within hours of launch tell the story. Developers are hungry for this. Not for another proprietary API with metered pricing and terms that change quarterly. For models they can own, modify, and deploy on their own terms.

Gemma 4 isn't perfect. It's a version 1.0 under a new licensing paradigm, and the real test comes when thousands of developers start stress-testing it in production. But as a starting point? I'll take it. My coffee got cold writing this, and I didn't notice until the third paragraph. That's usually a good sign.

— Drawn from building AI-powered production systems at Warung Digital Teknologi (wardigi.com), including SmartExam AI Generator, DiabeCheck Food Scanner, and BizChat Revenue Assistant.

🏷 Tagged: #gemma 4 #google #apache 2.0 #open model #llama #benchmark #mlx #ollama #ai model comparison

Enjoyed this article?

Get more AI insights — browse our full library of 103+ articles and 373+ ready-to-use AI prompts.

What Exactly Did Google Release With Gemma 4?

How Does Gemma 4 Compare to Llama and Other Open Models?

Why the Apache 2.0 License Change Is a Bigger Deal Than the Model Itself

What Can You Actually Build With Gemma 4 That You Couldn't Before?

The Catches — Because There Are Always Catches

My Honest Take After 24 Hours

Enjoyed this article?

📰 More like this

Claude Opus 4.7: The Complete Guide to Anthropic's Most Capable AI Model

The Linux Foundation Just Published a Tracker Showing Exactly How Much AI-Written Code Is Leaking Into Critical Open Source Projects — And the Numbers Are Wilder Than I Expected

Claude Code Source Leaked Through an NPM Map File — Anti-Distillation Fake Tools and Native Client Attestation Exposed

CERN Is Running AI Models Burned Into Silicon Chips — And It Changes How We Should Think About Edge Inference

OpenAI Just Killed Sora — And Disney Walked Away from a Billion-Dollar Deal in the Same Week

Someone Just Ran a 400 Billion Parameter AI Model on an iPhone 17 Pro — And the Real Story Is More Nuanced Than the Headlines Suggest