Jun 5, 2026

Open-Source LLM License Guide 2026: MIT, Apache, GPL, Llama

By AIFoss · 14 min read

licensingllmopen-sourcelegalselfhosted

TL;DR: Most open-weight LLMs are not under traditional open-source licenses. Apache 2.0 and MIT are the only safe bets for unrestricted commercial products. The Meta Llama Community License permits commercial use but contains restrictions — including a full EU exclusion for Llama 4 — that require more than a skim.

	MIT	Apache 2.0	Meta Llama Community	Gemma Terms
License type	Permissive	Permissive + patent grant	Custom / source-available	Proprietary terms
Commercial SaaS	✅	✅	✅ under 700M MAU	✅ with use policy
EU deployment	✅	✅	✅ Llama 3.3 / ❌ Llama 4	✅
OSI-certified open source	✅	✅	❌	❌
Fine-tune and redistribute	✅	✅	✅ name must start “Llama”	✅

Honest take: For any commercial product, pick Apache 2.0 or MIT — Mistral Small 4, Qwen3, or Phi-4. Llama is usable in most US-based contexts but carries conditions that require a real legal read, not a skim.

Why the license matters as much as the benchmark

Developers run evals, check SWE-bench scores, and measure tokens per second. The license file gets a glance at most. That’s the wrong order for anything commercial.

A model that scores 3% better on HumanEval but carries a problematic license is worse for your product than a slightly weaker alternative under Apache 2.0. License problems surface months into a build — when legal reviews the stack before a fundraise, when you scale past a threshold, or when a customer in Berlin asks for your data processing agreement.

There are five license families that cover virtually every open-weight LLM in production as of June 2026: MIT, Apache 2.0, GPL/AGPL, the Meta Llama Community License, and Google’s Gemma Terms of Use. Each has a different risk profile depending on what you’re building. The real cost comparison between FOSS AI and SaaS AI changes significantly depending on which tier of license risk you’re willing to carry.

MIT: the simplest permissive license

The MIT License is two paragraphs. It lets you do anything with the software — run it, modify it, distribute it, sell products built with it — as long as you preserve the copyright notice. No patents, no copyleft, no usage restrictions.

Models under MIT in 2026:

DeepSeek V3, V4-Pro, V4-Flash, R1 (all weights on HuggingFace)
Phi-4 and Phi-4 Multimodal (Microsoft)

The practical gap in MIT is what it doesn’t include: no patent grant. If the model creator holds a patent on an architecture technique embedded in the weights, MIT doesn’t automatically license you to use that patent. In practice, model architectures are rarely patent-enforced against downstream users — but Apache 2.0 closes this gap explicitly.

Also worth noting: DeepSeek’s MIT license ships with an Attachment A — a use-restrictions appendix covering illegal and hazardous applications. This is standard for AI models and doesn’t restrict normal commercial or SaaS use.

When MIT is the right call: Solo developer, startup, internal tool, mobile app — anything where you want zero legal overhead. Simplicity is the point.

Apache 2.0: MIT plus patent protection

Apache 2.0 is the license legal teams at larger companies actually prefer over MIT. The substantive difference: contributors grant a royalty-free patent license covering their contributions. If the model creator holds patents on their training techniques or architectural innovations, you’re protected.

Models under Apache 2.0 in 2026:

Mistral Small 4, Mistral 3 (all current open-weight Mistral releases)
Qwen2.5 (all sizes, 0.5B through 72B) and Qwen3 (all sizes)
Falcon 40B (Technology Innovation Institute)
GPT-NeoX-20B (EleutherAI)

Apache 2.0 allows commercial use, modification, redistribution, private use, and sublicensing. Requirements: preserve copyright notices, include the license text if redistributing, note any changes made. Minimal for API-based or self-hosted deployments.

One nuance with Mistral: earlier models (pre-2025) had a modified license with a revenue threshold — companies earning over $20M/month needed a commercial license. The current Mistral 3 generation and Mistral Small 4 dropped that restriction and moved fully to Apache 2.0. If you’re running an older Mistral model, verify the specific license on HuggingFace before assuming Apache 2.0.

When Apache 2.0 is the right call: Anything commercial where you want explicit patent protection. Larger team, legal review incoming, SaaS product, enterprise deployment. Apache 2.0 is the safe default.

GPL and AGPL: copyleft and the SaaS trap

GPL and AGPL are designed to keep software free — if you distribute a GPL-licensed program, you must release the source of your derivative under the same license. AGPL extends this to network use: if you run modified AGPL software as a service, you must share your modifications with users even if you never “distribute” the code.

No major open-weight LLM is released under GPL or AGPL. But the inference tooling around LLMs often is:

llama.cpp: MIT. Safe.
vLLM: Apache 2.0. Safe for hosted inference.
Text Generation WebUI (oobabooga): AGPL-3.0. If you run a modified fork as a hosted service, you must publish your modifications.
GPT4All desktop app: MIT for the app itself, but bundles components that use AGPL-licensed libraries. If you build a product on top of GPT4All’s internals, audit the full dependency tree.

The SaaS trap with AGPL is subtle. GPL’s traditional loophole let SaaS companies run GPL code on a server without “distributing” it, keeping the source private. AGPL was invented to close that loophole. If your inference layer uses AGPL code, you may be obligated to open-source your entire serving stack.

When this bites you: You fork oobabooga and build a proprietary SaaS on top. You bundle AGPL components into a packaged product. Both are real compliance risks.

Meta Llama Community License: read the actual text

The Meta Llama Community License is the most widely used non-standard license in the LLM space. Llama 3.3 and the Llama 4 family (Scout, Maverick) are all under it. It looks open — weights are public, commercial use is allowed — but it contains restrictions that don’t exist in Apache 2.0 or MIT.

What it allows:

Commercial use (building products, charging customers, SaaS)
Modification and fine-tuning
Redistribution with attribution
Using model outputs to train other AI models (added in Llama 3.1, continued in Llama 4 with attribution requirements)

What it restricts:

700M MAU threshold: If your product reaches 700 million monthly active users, your license terminates and you must request a separate commercial license from Meta before reaching that threshold. For the vast majority of products this is theoretical, but it means the license is technically conditional — a flag in legal reviews.

Derivative model naming: If you fine-tune a Llama model and release the weights publicly, the derivative model name must start with “Llama.” Your product is fine being called anything. Your released fine-tune weights on HuggingFace must be named “Llama-Something.” This is a branding requirement, not copyleft — you don’t have to open-source your fine-tune data or code.

No “Llama” in your product name: Your commercial product or SaaS cannot use “Llama” in its name or branding.

EU exclusion (Llama 4 only): The Llama 4 Community License explicitly excludes persons residing in the EU and companies headquartered in the EU from the license grant. This is the most significant new restriction in 2026. If your team is EU-based, or your product’s primary user base is EU, Llama 4 is legally unavailable under the community license. Llama 3.3 does not have this restriction. The practical effect has been a wave of EU teams migrating to Mistral Small 4 and Qwen3.

Not OSI-certified open source: The OSI’s definition requires no restriction based on groups of persons or field of endeavor. The EU exclusion and MAU threshold both fail this test. Use “open-weight” when describing Llama publicly — especially in contracts and procurement contexts.

When to use Llama anyway: Llama 3.3 is a solid choice for US-based teams building on capable 70B or 8B models. For EU teams, Llama 3.3 remains legally available — but Meta’s direction with Llama 4 suggests caution about locking in Llama as a long-term architectural dependency.

Google Gemma Terms of Use: “almost open”

Gemma 3 (1B–27B) is not released under any standard open-source license. Google uses custom Gemma Terms of Use that permit commercial use but carry Google-specific restrictions.

What Gemma allows:

Commercial use including SaaS deployment
Fine-tuning and derivative models
No MAU threshold
Redistribution with attribution and downstream policy compliance

What makes Gemma different from MIT/Apache:

You must propagate Google’s prohibited use policy downstream to anyone you distribute to
Google reserves the right to “restrict (remotely or otherwise) usage” if they believe you’re violating their policy — a clause with no equivalent in MIT or Apache 2.0
Not OSI-certified

The “remote restriction” clause is what legal teams flag. It’s unlikely to be invoked against normal commercial use, but it creates a dependency on Google’s continued goodwill that standard licenses don’t have.

When Gemma makes sense: Internal tools, research, hobbyist projects. For any production product where a clean license chain matters, Apache 2.0 alternatives like Qwen3 or Mistral offer better legal standing.

Model-by-model reference table

Model	License	Commercial	SaaS	EU	OSI
DeepSeek V4-Pro, R1	MIT	✅	✅	✅	✅
Phi-4 (Microsoft)	MIT	✅	✅	✅	✅
Mistral Small 4	Apache 2.0	✅	✅	✅	✅
Qwen3-32B	Apache 2.0	✅	✅	✅	✅
Falcon 40B	Apache 2.0	✅	✅	✅	✅
Llama 3.3-70B	Meta Llama Community	✅	✅	✅	❌
Llama 4 Scout / Maverick	Meta Llama Community	✅	✅	❌	❌
Gemma 3 (27B)	Gemma Terms	✅	✅	✅	❌
Falcon 180B	TII Apache-variant	✅ apps	❌ hosted API restricted	✅	❌

Verified against HuggingFace model cards and official license pages, June 2026.

Decision matrix by project type

Project type	Safe choices	Avoid
SaaS product	MIT, Apache 2.0	Falcon 180B (API restrictions), AGPL tooling
EU-based team or product	MIT, Apache 2.0	Llama 4 (explicitly excluded)
Internal enterprise tool	MIT, Apache 2.0, Llama 3.3	Llama 4 if EU headcount
Open-source project	MIT, Apache 2.0	Meta Llama (non-OSI, conditions propagate)
Mobile app	MIT, Apache 2.0 (Phi-4, small Qwen3)	Gemma (remote restriction clause)
Inference API service	MIT, Apache 2.0	Falcon 180B (requires TII consent for hosting)
Fine-tune + publish weights	MIT, Apache 2.0	Llama (derivative must be named “Llama X”)
VC-funded product	Apache 2.0	Anything non-OSI at Series A+

The gotchas: models that look open but aren’t

Fine-tunes inherit the base license. A fine-tune of Llama 3.3 is still under the Llama Community License regardless of what the HuggingFace uploader named it or what license they listed. Always check the base model field in the model card, not just the fine-tune. There are thousands of Llama fine-tunes on HuggingFace with incorrect or missing license fields.

Quantized GGUF files carry the base license. A GGUF conversion of a Llama model — whether from bartowski, TheBloke, or anyone else — is a derivative of the original weights. The Llama Community License follows those bits wherever they go.

Falcon 180B ≠ Falcon 40B. Falcon 40B is Apache 2.0 with no restrictions. Falcon 180B uses a modified TII license that prohibits offering the model as a hosted API to third parties without TII’s consent. If you’re running an inference service at scale, this is a live restriction, not a hypothetical.

Some Mistral models still carry the old license. Mistral 7B (original), Mixtral 8x7B, and models released before Mistral’s 2025 license reset may still have the $20M/month revenue condition. Check the specific model version on HuggingFace — don’t assume all Mistral models are Apache 2.0 because newer ones are.

Synthetic data from OpenAI breaks the chain. OpenAI’s Terms of Service prohibit using API outputs to train a competing model. If someone fine-tuned an open-weight model on GPT-4 outputs and you pull those weights, you’re holding a license problem — even if the base model is MIT. DeepSeek R1 was generated from DeepSeek’s own synthetic data pipeline under their MIT license, making it one of the cleaner options in this regard.

Non-commercial fine-tunes are everywhere. Models like early Alpaca and some WizardLM variants were built on original LLaMA 1 (which had a non-commercial research license) or on OpenAI API outputs. These can still appear in search results and HuggingFace recommendations. Using them in production is a compliance risk regardless of what the uploader’s README says.

How to check a model’s license in two minutes

Before pulling any model into production, run this against the HuggingFace API:

# Check license and base model for any HuggingFace model
MODEL="Qwen/Qwen3-32B"   # replace with your target
curl -s "https://huggingface.co/api/models/${MODEL}" | python3 -c "
import json, sys
d = json.load(sys.stdin)
card = d.get('cardData', {})
print('License:    ', card.get('license', 'NOT SPECIFIED — check FILES tab'))
print('Base model: ', card.get('base_model', 'this IS the base model'))
"
# Qwen3-32B output:
# License:     apache-2.0
# Base model:  this IS the base model

If the license field is empty or says “other,” open the FILES tab on the model page and look for a LICENSE or LICENSE.md file. No license file means the model has no explicit grant to use it — treat it as all-rights-reserved until proven otherwise.

For Ollama-served models, the model name maps back to a HuggingFace base. ollama run llama3.3 pulls Meta Llama 3.3 weights — the Llama Community License applies regardless of the Ollama interface.

When in doubt on anything non-MIT/Apache, get a lawyer to review it before committing to it in a customer-facing product. A 30-minute legal consult costs less than re-architecting a product six months in.

For a broader look at how license risk factors into total cost-of-ownership, the open-source AI stack overview covers which tools work well together for different deployment profiles.

FAQ

Q: Is Llama “open source”?
No — not by the OSI definition. The 700M MAU threshold, EU exclusion in Llama 4, and naming requirements for derivative models all disqualify it from OSI certification. The correct term is “open-weight.” This distinction matters in procurement, compliance reviews, and anywhere you need to represent the software’s licensing status formally.

Q: Can an EU startup use Llama 3.3?
Yes. The EU exclusion is specific to Llama 4 and applies to both EU-resident individuals and EU-headquartered companies. Llama 3.3 remains available to EU developers under the standard Llama Community License. Given Meta’s direction with each new release, factoring future model migration into your architecture is reasonable.

Q: Does Apache 2.0 cover training data copyright?
No. Apache 2.0 covers the model weights themselves, not the copyright of any data used to train them. This is a separate legal question. Models trained on openly licensed data — like Falcon, trained on RefinedWeb — carry lower training-data risk than models with undisclosed or proprietary training sets.

Q: What license should a VC-backed company use?
Apache 2.0. VCs and their legal teams recognize it, it includes explicit patent protection, and it has no scale thresholds, revenue conditions, or geographic restrictions. Qwen3, Mistral Small 4, and Falcon 40B are all competitive Apache 2.0 options that won’t create friction in a due diligence process.

Q: If I fine-tune Llama and don’t release the weights, does the naming restriction apply?
No. The naming requirement (“Llama” must be at the start of the model name) only applies when you distribute or make the fine-tuned weights available to others. Keeping the fine-tune private — running it internally, serving it from your own infrastructure — does not trigger this requirement.

Sources

Was this article helpful?