Case Study #6
15 gaps caught before writing a single line of code.
We built a self-hosted AI email drafter from scratch with Claude Code. Before coding started, MegaLens reviewed the full build plan. It surfaced 15 gaps, including 2 critical ones that would have caused real damage in production. Then every commit was reviewed before proceeding to the next step.
15
Gaps caught pre-code
2
Critical
13
New to our review
350+
Tests in final build
What we built
An AI Email Drafter. A backend service that polls a Gmail inbox, identifies real business inquiries, and creates draft replies answered only from a user-provided context file. If the answer isn't in the file, no draft is created. The tool never sends email. It can't. The Gmail send permission is never requested.
The workflow: Claude Code + MegaLens
Claude Code wrote the plan and the code. MegaLens reviewed the plan before implementation, then reviewed each commit during the build. Think of it like this: Claude Code is the builder, MegaLens is the inspector who checks the blueprints and then inspects each floor as it goes up.
Wrote a 19-section build plan with Claude Code
Did our own quick review. Found 2 items.
Ran MegaLens on the full plan. Got 15 findings.
Fixed all 15 in the plan before writing code.
12 implementation steps, each commit reviewed by MegaLens.
Working service. 350+ tests. Adversarial prompt injection suite.
What MegaLens caught
15 findings total. 13 were additions beyond our own pre-audit. Here are the ones that mattered most.
Critical (2)
The plan had a daily spending cap but no code to parse actual costs from the API response. The cap was a sign on the wall that nobody reads. Fix: added real cost calculation from every API response.
On database loss, the tool would process every unread email in the inbox, not just recent ones. Dozens of unwanted drafts and a surprise API bill. Fix: added a time-bounded lookback (default: last 24 hours only).
High (7)
Email headers passed raw into the AI prompt (injection vector through crafted Subject lines)
Draft-create race condition: duplicate drafts on crash between decide and save
No MIME parsing spec: how to handle multipart, HTML fallback, charset detection, signature stripping
Reply envelope headers undefined: To, Cc, In-Reply-To, References not specified
History sync processing wrong event types as new mail
Poison messages retry forever with no quarantine or backoff
systemd Environment= exposes API keys in /proc and systemctl output
Medium (6)
Encryption key naming inconsistent across config and code
Gmail historyId 7-day expiry unhandled (silent data loss after offline period)
Thread and draft checks burning Gmail API quota (per-message instead of per-cycle)
No citation verification (closed-world rule enforced by prompt only, not code)
No precision/recall acceptance criteria for the draft vs. skip decision
SQLite without WAL mode (database-locked errors under concurrent access)
Safety choices
These aren't features. They're constraints. We deliberately limited what the tool can do.
It can't send email.
Not "it doesn't send." It can't. The Gmail permission it requests (gmail.compose) only allows creating drafts. The send permission (gmail.send) is never requested.
It only answers from your file.
If the answer isn't derivable from your context file, no draft is created. No guessing. No "let me check and get back to you." Just a skip.
Email bodies are untrusted.
The AI is explicitly told to ignore instructions inside email content. Headers are sanitized (control characters stripped, truncated, newlines removed). Structured JSON output limits the attack surface.
Old email is ignored on fresh start.
First run only processes the last 24 hours (configurable). No surprise batch of 200 drafts from last month.
Bad messages stop retrying.
After 3 failures, a message is quarantined. The tool moves on instead of crashing in a loop.
The numbers
15
Findings before code
2
Critical gaps
7
High severity
6
Medium severity
13/15
New to our review
12
Implementation steps
13
Commits
350+
Tests in final build
10+
Prompt injection tests
Honest notes
Our pre-audit was intentionally quick (a few items per category). A more careful self-review would have caught more than 2 of the 15.
Some findings would probably have surfaced during implementation even without MegaLens. Others probably wouldn't have until production.
MegaLens produces findings, not proofs. Every finding required human judgment to assess severity and decide on a fix. This is one build on one plan. Results will vary depending on the project, plan quality, and how thorough your own review is.
Review your build plan before writing code.
MegaLens adds a structured multi-engine review layer to your IDE. Your AI stays the final judge. MegaLens gives it blind-spot coverage.
Try MegaLens Free