Blog | JP Couture

Glassworm: When Source Code Lies to Your Eyes

A few days ago, the team at Aikido Security published a new report on a campaign they’ve been tracking for almost a year (read it here). The threat actor goes by Glassworm, and the technique they use is, I’ll be honest, kind of mind-bending when you first wrap your head around it.

The short version: attackers hide malicious payloads inside invisible Unicode characters embedded directly in source files. The code looks completely harmless when you read it. But it isn’t.

What’s Actually Happening

Here’s the trick.

JavaScript runtimes, and most modern languages, handle the full Unicode character set. That includes thousands of characters that render as absolutely nothing: zero-width spaces, zero-width joiners, characters from Unicode’s private-use areas. All invisible. All valid.

Glassworm uses these characters to encode a full malicious payload inside what appears to be an empty string. Look at this:

const s = v => [...v].map(w => (
  w = w.codePointAt(0),
  w >= 0xFE00 && w <= 0xFE0F ? w - 0xFE00 :
  w >= 0xE0100 && w <= 0xE01EF ? w - 0xE0100 + 16 : null
)).filter(n => n !== null);

eval(Buffer.from(s(``)).toString(utf-8));

The backtick string passed to s() looks empty in every viewer, but it’s packed with invisible characters that decode into a full malicious payload.

A developer reviewing this PR sees an empty string and a weird little utility function. Nothing jumps out. The review passes. The package ships.

This isn’t theoretical. Aikido documented real-world cases where these techniques were used in npm packages, including React-related ecosystems (example). A GitHub code search for the decoder pattern currently returns at least 151 matching repositories, and many affected repos had already been deleted by the time Aikido published. The campaign ran between March 3 and March 9, 2026, and it’s not limited to GitHub: npm and the VS Code marketplace are also affected. Some of the compromised repos are ones you’d actually recognize, including projects from Wasmer, Reworm, and opencode-bench from anomalyco, the organization behind OpenCode and SST.

The Trojan Source Connection

If you’ve been in security for a while, this might ring a bell. Back in 2021, researchers published the Trojan Source attack, which used Unicode bidirectional control characters to make code look different from how it actually compiles. A comment that renders as a comment could actually close a string and open a code path.

Glassworm is in the same family, but it goes further. Trojan Source makes code look wrong. Glassworm makes code look empty. There’s nothing to look wrong.

Why Your Tooling Won’t Catch It

This is the part that stuck with me.

Most SAST tools scan for dangerous APIs, known vulnerability patterns, dependency issues. They parse code, walk ASTs, match signatures. What they generally don’t do is inspect the Unicode properties of every character in every string. Why would they? Nobody thought to encode malware as invisible characters in source files. Until someone did.

It’s not just static analysis. Even careful, experienced reviewers will look right past a string full of zero-width spaces. The diff looks clean. The logic looks fine. The payload just sits there.

The surrounding changes are realistic too: documentation tweaks, version bumps, small refactors, bug fixes that are stylistically consistent with each target project. That level of project-specific tailoring strongly suggests the attackers are using LLMs to generate convincing cover commits. At 151+ repositories, you can’t handcraft that.

This is what a modern supply chain attack looks like. The campaign is already getting broader attention in the security community as it evolves (coverage). Not a sloppy package with obvious obfuscation. A polished PR, AI-generated context, and a payload that’s literally invisible to humans.

What This Actually Means for Code You Maintain

If you maintain an open source project, you’ve probably merged a PR at some point without scrutinizing every character in every string literal. That’s not negligence, that’s just how code review works at any reasonable pace. Glassworm exploits exactly that.

The attack surface is basically any file in your repository that a contributor can touch. A malicious actor submits a realistic-looking PR: a small refactor, a dependency bump, a doc fix. Buried inside one of the changed files is a string that appears empty but contains an encoded payload. The payload gets decoded and executed at runtime, silently, with whatever privileges your code has.

In past Glassworm incidents, the decoded payload fetched and executed a second-stage script using Solana as a delivery channel, capable of stealing tokens, credentials, and secrets.

The blast radius depends on where the infected file runs. If it’s in a CI pipeline, an attacker exfiltrates your secrets and environment variables. If it’s a library, every downstream project that installs it inherits the infection. If it’s a VS Code extension, it runs with full access to your local machine. If it’s a build script, you’ve handed an attacker a shell on every machine that builds your project.

The scariest part isn’t the payload. It’s the delivery. The commits are tailored to your project’s style, likely AI-generated, and visually indistinguishable from a legitimate contribution. The classic advice, review your PRs carefully, check who’s submitting, isn’t sufficient on its own. You can’t visually review invisible characters. The PR can look great. The diff can look great. And you still got hit.

The question shifts from "did I review this carefully enough?" to "does my toolchain scan for things that are invisible to me?"

So I Built ghostscan

After reading Aikido’s post, I wanted to actually look for this stuff in a codebase. So I built 👻ghostscan. The idea behind ghostscan is simple: scan a repository and make invisible Unicode attacks visible.

It’s a Go CLI tool. You point it at a path, it walks the tree, reads every text file rune by rune, and flags anything suspicious. No network calls. No code execution. Just a focused static scanner that knows what to look for.

ghostscan detects several classes of problems:

Invisible characters: zero-width spaces, joiners, non-joiners, word joiners, the zero-width no-break space. Characters that render as nothing but encode data.

Private-use Unicode: characters in the ranges U+E000-U+F8FF and the supplementary planes, which almost never appear in legitimate source code.

Trojan Source bidirectional controls: the full set of bidi override and isolate characters that can make code display differently from how it compiles.

Encoded payload sequences: long runs of invisible or PUA characters that look like they’re encoding something, because they are. The heuristic looks for density: if 12 or more suspicious runes cluster in a 24-character window, that’s not an accident.

Decoder patterns near payloads: eval(, Buffer.from(, atob(, new Function(, and similar patterns. On their own they’re medium severity. If a suspicious payload is found within 25 lines of one in the same file, both are merged into a single high-severity correlation finding.

Mixed-script tokens: identifiers that blend Latin with Cyrillic or Greek, the classic homoglyph substitution attack where pаyload uses a Cyrillic U+0430 instead of a Latin a.

Combining mark tokens: identifiers with hidden Unicode combining marks attached to base characters, creating visually identical but functionally different names.

When ghostscan finds something, it renders the hidden characters explicitly:

[HIGH] Trojan Source bidi control characters detected

Rule: unicode/bidi

Locations:
  line 87 col 14  <U+202E RIGHT-TO-LEFT OVERRIDE>

Or for an encoded payload:

[HIGH] Suspicious encoded payload sequence

Rule: unicode/payload
Location: line 142, col 8

Evidence:
  payload: <U+FE00><U+FE01><U+FE02><U+FE03> ...

The invisible becomes something you can actually look at.

The design is deliberately minimal. Single binary. No daemon. No server. No config file required. It treats every repo it scans as hostile input: no symlink following, no executing anything, no network access. The scan pipeline is deterministic, so the same repo produces the same results every time. That matters a lot if you’re wiring it into CI.

Performance-wise, the scanner processes files as streams, one at a time, without loading the whole repository into memory. It targets around 100,000 files in under 10 seconds on typical developer hardware. For most projects, dropping it into a CI step should just work.

What the Output Looks Like in Practice

Running it against a clean repo:

ghostscan
=========

Result: CLEAN

Files scanned: 1,247
Files with findings: 0

Severity:
  none   0

Running it against something infected:

ghostscan
=========

Result: FINDINGS DETECTED

Files scanned: 1,247
Files with findings: 1

Severity:
  HIGH  1

Top concerns:
  1. Hidden Unicode payload with nearby decoder pattern in 1 file

-- src/utils/loader.js --------------------------------
Severity: HIGH
Incidents: 1
Supporting observations: 2

[HIGH] Hidden Unicode payload with nearby decoder pattern

Rule: unicode/correlation
Location: line 23, col 19

Why this matters:
  Invisible or private-use Unicode can hide an encoded payload.
  A decoder or dynamic execution primitive was found 8 lines away.

Evidence:
  payload: <U+FE00><U+FE01><U+FE03><U+FE05>...
  decoder: eval(Buffer.from(s(`...`)).toString(‘utf-8’))
  distance: 8 lines

Supporting observations:
  [MEDIUM] Invisible Unicode sequence
    start: line 23 col 19
    length: 4

  [MEDIUM] Decoder pattern
    pattern: eval(Buffer.from(s(`...`)).toString(‘utf-8’))
    line: 31

Exit codes are what you’d expect for CI: 0 for clean, 1 for findings, 2 for a scan error.

A Genuine Thank You to Aikido

The research from Aikido is what made ghostscan exist. They’ve been tracking this campaign since March 2025, traced the initial npm packages, watched it spread to GitHub and VS Code extensions, and published detailed write-ups at every stage. That’s a year of persistent work on an attack class that most of the industry wasn’t paying attention to.

If you want the full picture of the campaign, the decoded payloads, the Solana delivery mechanism, the full list of affected repositories, read their original write-up above. It’s excellent.

Final Thought

There’s something philosophically interesting about this class of attack. It exploits the assumption that source code is text, and text is readable. But Unicode is huge, 140,000-plus characters, many of which are invisible, directional, or specifically designed to affect rendering rather than semantics. We’ve been writing code under the assumption that what we see is what the machine reads. Glassworm breaks that assumption.

ghostscan is one tool for closing that gap. Make the invisible visible. Ship it in CI. Don’t let a zero-width space ruin your week.