Blog
YCAudit
Security
Prompt Injection
AI Agents

Channel to shell: a blind audit of a 3-million-line AI assistant

We already pulled this open-source project apart by its CVE feed. This time we read it cold - 3.2 million lines, no brief, no advisory list - and the blind pass surfaced 71 issues the advisory feed never will. Nine are Critical, and they are all the same path: an untrusted chat message reaching a terminal, a URL fetch, or a code-exec tool with nothing guarding the gap.

June 15, 2026
·8 min read·Yeda AI Team

Earlier we took a large, heavily-audited open-source project and checked it the careful way - pinned to one release, every published advisory source-verified. That lens answers one question well: which known vulnerabilities actually reach this checkout. It says nothing about the bugs that were never anyone's CVE. So we ran the other lens over the same repo - a blind, cold read of the whole tree - and it found a different class of problem entirely.

The target is a popular open-source AI assistant: it bridges 22+ chat channels to a model that can run a terminal, fetch URLs, render a canvas, and call tools. 16,665 source files, 3.2 million lines. No brief, no advisory list - YCAudit read it from scratch and ranked 71 findings, nine of them Critical. Every one of those nine traces the same path.

16,665

Source files

3.2M

Lines of code

71

Findings

9

Critical

Two lenses on one repo

We said it plainly in the version-range post: blind discovery across a million-plus lines is not reliable for enumerating known-vulnerability exposure. That is still true - for that job, anchor to the advisory feed and read the source. But known-CVE matching has a blind spot of its own: it can only find bugs someone has already filed. The way a system wires untrusted input to dangerous capability is almost never in a CVE feed - it is in the architecture. The blind pass is the lens that sees it.

The one path that matters

Strip the nine Criticals down and they are one sentence: an untrusted message reaches a dangerous capability with nothing in between. The entry point is the same for all of them - inbound messages from 22+ channels are consumed without sanitization, on both the immediate path and the stored one, because chat history is replayed back into the model's context. From there the message reaches one of four sinks, and each sink is missing the guard that would make it safe.

A terminal at the end of a chat message

A coding-agent tool runs model-generated code through a pseudo-terminal on the host - no sandbox, no container, no per-command confirmation. A prompt-injected message that produces a run-terminal call executes on the machine as the app's owner. This leg is confirmed to root cause.

A URL fetch with no allowlist

The model picks a URL; the app fetches it and extracts the readable text. No scheme or host allowlist sits in front of the call, so the same path reaches http://169.254.169.254/ (cloud-metadata credentials), file:///etc/passwd, and internal localhost services. Server-side request forgery, driven by the model. Also confirmed to root cause.

A canvas that renders model output as HTML

LLM output is written to the canvas DOM with no confirmed sanitization - a script-injection path into the webview origin. Because chat history is replayed into context, the injection can be stored and re-triggered later. Marked partial: the report withholds a fix until a source-level pass confirms it.

A tool dispatcher that trusts the tool name

The tool layer executes the tool name and arguments the model supplies, with no confirmed allowlist or schema validation. That turns model control of text into model control of whichever OS and network operations the tools expose. Also partial - flagged, not yet fixed.

Two of those legs - the terminal and the URL fetch - were run down to a confirmed root cause in this pass. The other two are marked partial: the pipeline flagged them, then withheld a fix until a source-level investigation confirms the exact line. That is the honest version of an audit - the report tells you which findings it stands behind and which still need a probe, instead of pretending every flag is a fact.

9

Critical

22

High

29

Medium

11

Low

The two snippets below are illustrative - renamed and simplified, not the project's source - but the shape of each gap is exact. First, the fetch sink: a model-chosen URL handed straight to the network with no allowlist in front of it.

// the model decides what to fetch; the app just fetches it
const url = recommendation.sourceUrl;          // model-controlled
const page = await fetch(url).then(extractReadable);

// nothing rejects:
//   http://169.254.169.254/...   -> cloud-metadata credentials
//   file:///etc/passwd           -> local file read
//   http://localhost:<port>/...  -> internal-only services
Illustrative — renamed and simplified, not the project's source.

And the terminal sink: a tool call that turns a chat message into a process on the host, with no sandbox and no confirmation between the two.

// a channel message becomes a tool call becomes a shell
registerTool("run_terminal", ({ command }) => {
  return pty.spawn(command);   // runs on the host as the app owner
});                            // no seccomp, no container, no "are you sure?"
Illustrative — renamed and simplified, not the project's source.

The bugs the model shipped

The blind pass also reads the dependency graph, and that is where the generator's fingerprints show up. None of these is a known CVE - they are the artifacts of code assembled fast, across many sessions, by a model that fills in a plausible value and moves on.

A dependency version that does not exist

The manifest pins a package at a version number that was never published - a value the model invented and nobody caught. It is the dependency-graph equivalent of a hallucinated API: it looks specific, it looks deliberate, and it points at nothing.

A package name that looks official but is not

A suspected slop-squat: a dependency whose name mimics an official SDK closely enough to pass a glance, sitting on a privileged path. The supply-chain version of a typo nobody re-reads - and exactly the gap a name-confusion attacker farms.

A beta native module on a critical path

A pre-release C++ addon - the kind that talks to the OS kernel - wired into the terminal-execution path. Beta packages carry no security-stability guarantee and may never get a patch, and advisory tooling does not reliably track pre-release CVEs, so a scanner stays quiet on it.

Seventy minor versions of drift

A cloud SDK on a must-guarantee path running roughly seventy minor versions behind current, next to a framework upgrade whose changed error-propagation semantics can quietly skip past auth and rate-limit middleware. Each was generated in its own session; nothing reconciled them.

No net under any of it

The last stage checks the tests, and this is the part that turns nine Criticals from a backlog into a live exposure. The test suite exists - unit tests, end-to-end tests, the works - but it has zero security-enforcement coverage. Nothing asserts the fetch sink rejects a cloud-metadata URL. Nothing asserts a channel message cannot reach the terminal without a confirmation. Nothing asserts onboarding does not write a plaintext secret to a tracked file. The report ships 15 proposed regression tests to close that gap and a one-word verdict for the suite as it stands: NOT READY.

A guard you add today and a guard a test protects are not the same guard. Without the regression, the next fast change quietly removes the fix and every green check still passes.

What blind discovery is actually for

Put the two passes side by side and the division of labor is clean. The advisory lens is for known-vulnerability exposure - and there, version ranges over-report by nearly half, so you verify against source. The blind lens is for the bugs that were never filed: the unguarded path from input to capability, the invented dependency, the missing sandbox. One repo, two questions, two tools. Neither substitutes for the other - and shipping AI-generated code without running both is how a 3-million-line assistant ends up one chat message away from a shell.