Open-Source AI Security 2026: The OSSRA Wake-Up Call
TL;DR: Black Duck’s 2026 OSSRA report found open source vulnerabilities more than doubled to 581 per codebase, and AI tooling is the sharp edge of that curve. Self-hosted AI stacks carry two supply chains — code dependencies and model weights — and both got hit hard this year. Audit your stack now; the checklist below takes 20 minutes.
What you’ll walk away with:
- A clear picture of why your Ollama / vLLM / Open WebUI stack is riskier than a normal web app
- A copy-paste audit you can run today to find unpatched components and exposed endpoints
- A short list of the exact 2026 CVEs to confirm you’ve patched
Honest take: Self-hosting AI for privacy is the right call, but “I run it locally so it’s safe” is the most expensive assumption in this space. The 2026 CVEs are unauthenticated and remote. Treat your inference server like the public-facing service it often accidentally becomes.
What the OSSRA 2026 Report Actually Found
The Open Source Security and Risk Analysis (OSSRA) report is Black Duck’s annual audit of real codebases. The 2026 edition, published February 25, 2026, analyzed 947 codebases across 17 industries. The headline number: the mean count of open source vulnerabilities per codebase rose 107% year-over-year to 581 vulnerabilities.
The rest of the data is worse than the headline:
- 87% of audited codebases contained at least one vulnerability.
- 78% contained a high-risk vulnerability, and 44% contained a critical-risk one.
- 65% of organizations experienced a software supply chain attack in the past year.
- License conflicts hit 68% of codebases, up from 56% — the largest single-year jump in OSSRA history.
The report ties the spike directly to AI-assisted development. The mean number of files per codebase grew 74% in a year, components grew 30%, and roughly 85% of organizations now use AI coding assistants. More code, pulled in faster, from more dependencies — and the vulnerability count tracks it almost exactly.
That’s the macro picture. The part the report doesn’t spell out, but that matters most for this site’s readers, is what happens when the software being shipped is an AI inference stack.
Why Self-Hosted AI Is the Sharp Edge of This Trend
A normal web app has one supply chain: its package dependencies. A self-hosted AI stack has two.
Supply chain one: the dependency tree. Tools like vLLM, Open WebUI, and the libraries Ollama links against pull in enormous Python and Go dependency graphs — PyTorch, transformers, CUDA bindings, image and video codecs, web frameworks. Every one of those is a line in the OSSRA average. A multimodal serving stack is exactly the “more files, more components” codebase the report describes.
Supply chain two: the model weights. This is the part traditional appsec misses. A GGUF or safetensors file is untrusted input that your inference server parses, and in some configurations executes. Downloading “a model” is functionally the same trust decision as downloading and running a binary from a stranger — except the tooling makes it feel like clicking a link.
Both supply chains produced unauthenticated, remotely exploitable CVEs in the first half of 2026. These aren’t theoretical.
The 2026 AI-Stack CVEs That Prove the Point
Here are the confirmed, patched vulnerabilities across the three tools most aifoss.dev readers run. Each one is real, each has a fixed version, and several were exploited or trivially exploitable from the network.
| Tool | CVE | Severity | What it does | Fixed in |
|---|---|---|---|---|
| Ollama | CVE-2026-7482 (“Bleeding Llama”) | CVSS 9.1 | Unauthenticated heap out-of-bounds read in the GGUF loader leaks process memory (API keys, prompts, conversations) via /api/create + /api/push | 0.17.1 |
| vLLM | CVE-2026-22778 | CVSS 9.8 | Unauthenticated RCE via a crafted video URL to a multimodal endpoint (info leak chained with heap overflow) | 0.14.1 |
| vLLM | CVE-2026-27893 | CVSS 8.8 | trust_remote_code bypass — a malicious model repo executes arbitrary code despite the user opting out | See advisory |
| Open WebUI | CVE-2025-64496 | 7.3 | Account-takeover path via Direct Connections; a malicious “model server” runs JavaScript in the victim’s browser | 0.6.35 |
| Open WebUI | CVE-2026-44566 | High | Path traversal lets an attacker upload files to arbitrary filesystem locations | 0.1.124+ |
Two patterns jump out. First, the highest-severity bugs are in the parsers — the GGUF tensor reader, the video decoder. That’s supply chain two: untrusted model and media input hitting memory-unsafe code paths. Second, trust_remote_code keeps being the foot-gun it always was. Pulling a model from a repository can mean executing that repository’s Python on your box.
For the full incident-response playbook on the Ollama bug specifically, see the Ollama security lockdown guide. What follows is the cross-stack audit.
The 20-Minute Self-Audit
Run this against any machine serving local AI. It checks both supply chains plus the most common misconfiguration — an inference port exposed to a network it shouldn’t be on.
# 1. Confirm tool versions against the fixed releases above.
# Compare each to its GitHub releases page; do not assume "latest" = patched.
ollama --version
python -c "import vllm; print(vllm.__version__)" 2>/dev/null
pip show open-webui 2>/dev/null | grep -i version
# 2. Audit the Python dependency tree for known CVEs.
pip install pip-audit
pip-audit # flags vulnerable transitive deps (PyTorch, transformers, PIL, etc.)
# 3. Find inference ports listening on non-loopback interfaces.
# 11434 = Ollama, 8000 = vLLM default, 8080/3000 = Open WebUI.
ss -tlnp | grep -E '11434|8000|8080|3000'
# A line showing 0.0.0.0 or :: means it is reachable beyond localhost. That is the problem.
# 4. Grep your launch scripts and units for the two classic mistakes.
grep -rIn "OLLAMA_HOST" /etc/systemd ~/.config 2>/dev/null # 0.0.0.0 = exposed
grep -rIn "trust_remote_code" . # True = arbitrary code on model load
What you’re looking for:
- Step 1: any version below the “Fixed in” column. The Bleeding Llama fix landed in Ollama 0.17.1; if you’re on something older you are leaking memory to anyone who can reach the API.
- Step 2:
pip-auditoutput is the OSSRA report in miniature for your own box — expect a longer list than you’d like, and triage critical/high first. - Step 3: an inference server bound to
0.0.0.0with no proxy in front of it is the single most common way these CVEs get exploited. Bind to127.0.0.1and put authentication in front if you need remote access. - Step 4:
trust_remote_code=Trueshould be the rare, deliberate exception, never a default you copied from a tutorial.
A Real Failure, and the Fix
The most common way I’ve seen this go wrong is not an exotic exploit — it’s the boring chain that OSSRA is really about.
A developer wants to try a new model that ships custom modeling code. The HuggingFace card says “set trust_remote_code=True.” They do, because nothing works otherwise. That model’s repo is later updated (or was malicious from the start) with a __init__ that phones home. Now CVE-2026-27893-style execution isn’t even needed — the user opted in.
The fix is a policy, not a patch:
- Never run
trust_remote_code=Trueon a model you can’t fully audit. Prefer GGUF/safetensors converted by a trusted source over repos that demand remote code. - Pin model sources. Record the exact repo and revision hash you downloaded, the same way you pin a dependency version.
- Isolate the inference process. Run it as an unprivileged user, in a container with no host-network access and no credentials it doesn’t strictly need. If the parser does get popped, it leaks an empty room.
That third point is the one that turns a CVSS 9.8 into a contained incident. The OSSRA data says you will be running vulnerable components — the question is what they can reach when one fails.
When This Doesn’t Apply to You
Not every self-hoster needs to treat this like a SOC problem. If your setup is genuinely a single laptop, Ollama bound to 127.0.0.1, no remote access, models pulled only from the official Ollama library or major first-party repos — your real attack surface is small. The OSSRA numbers describe organizational codebases and internet-facing deployments, not a hobbyist running ollama run offline.
The line to watch is the moment you cross from “local toy” to “shared service”: the first time you set OLLAMA_HOST=0.0.0.0 to reach it from your phone, expose Open WebUI to your home network’s other devices, or stand up vLLM for a team. That’s when the second supply chain and the unauthenticated CVEs start mattering, and that’s when the 20-minute audit earns its keep. For teams making exactly that jump, the self-hosted AI stack for dev teams guide covers the multi-user hardening in depth.
One more honest limitation: scanning catches known vulnerabilities. The Bleeding Llama bug existed for months before it had a CVE number. pip-audit and version checks are necessary, not sufficient — process isolation is what protects you from the bug nobody has found yet.
FAQ
Does running AI tools locally make them safe from these CVEs? No. “Local” only helps if the service is actually bound to localhost and unreachable from any network. Most of the 2026 AI CVEs are unauthenticated and remote — the moment your inference port is reachable, local hosting offers no protection. The exposure usually comes from one config line, not a sophisticated attack.
Is the OSSRA report about AI models or about software? Software. OSSRA audits open source components in codebases — dependencies, licenses, and their vulnerabilities. The 2026 angle is that AI-assisted coding is inflating both the volume of code and the number of components, which drives the vulnerability count up. For self-hosted AI specifically, that dependency risk compounds with the separate risk of untrusted model weights.
What’s the single highest-impact thing to fix first?
Check what your inference ports are bound to (ss -tlnp). An AI server on 0.0.0.0 with no auth in front of it is how nearly every one of these CVEs gets exploited in the wild. Fix the binding before anything else, then patch versions.
Is trust_remote_code=True really that dangerous?
Yes. It means loading a model can execute that model repository’s Python on your machine. CVE-2026-27893 was specifically about bypassing the user’s opt-out — but even without a bypass, opting in to a repo you haven’t audited is the same trust decision as running a stranger’s script.
How often should I re-run the audit? Monthly for a stable home setup, and immediately whenever a new high-severity CVE drops for a tool you run. The open-source AI release cadence is fast — see the release cadence breakdown — so patch windows are short and exploit windows open quickly.
Sources
- Black Duck — 2026 OSSRA Report: Open Source Vulnerabilities Double as AI Soars
- Black Duck press release — Open Source Vulnerabilities Have Doubled as AI Accelerates Code Creation (Feb 25, 2026)
- Cyera Research — Bleeding Llama: Critical Unauthenticated Memory Leak in Ollama (CVE-2026-7482)
- SecurityWeek — Critical Bug Could Expose 300,000 Ollama Deployments to Information Theft
- Orca Security — CVE-2026-22778: Critical vLLM RCE & Server Takeover
- RAXE Labs — vLLM Hardcoded trust_remote_code Bypass (CVE-2026-27893)
- Cato Networks — Open WebUI Account Takeover and RCE (CVE-2025-64496)
- GitLab Advisory DB — Open WebUI Arbitrary File Upload and Path Traversal (CVE-2026-44566)
Was this article helpful?
Thanks for the feedback — it helps improve future articles.
Need hands-on help?
I offer 1-on-1 technical consulting for local AI setup, GPU selection, and AI coding tool configuration — same topics covered on this site.
Book a session — $49 / hour →