Case Studies/AI Email Drafter

Case Study #6

15 gaps caught before writing a single line of code.

We built a self-hosted AI email drafter from scratch with Claude Code. Before coding started, MegaLens reviewed the full build plan. It surfaced 15 gaps, including 2 critical ones that would have caused real damage in production. Then every commit was reviewed before proceeding to the next step.

15

Gaps caught pre-code

2

Critical

13

New to our review

350+

Tests in final build

What we built

An AI Email Drafter. A backend service that polls a Gmail inbox, identifies real business inquiries, and creates draft replies answered only from a user-provided context file. If the answer isn't in the file, no draft is created. The tool never sends email. It can't. The Gmail send permission is never requested.

Polls Gmail inbox on a schedule
8 rule-based filters (no AI needed)
AI drafts from your context file only
Never sends, drafts only
Encrypted credential storage
Daily cost cap with circuit breaker
Poison message quarantine
Hot-reload context file changes

The workflow: Claude Code + MegaLens

Claude Code wrote the plan and the code. MegaLens reviewed the plan before implementation, then reviewed each commit during the build. Think of it like this: Claude Code is the builder, MegaLens is the inspector who checks the blueprints and then inspects each floor as it goes up.

Plan

Wrote a 19-section build plan with Claude Code

Pre-audit

Did our own quick review. Found 2 items.

MegaLens audit

Ran MegaLens on the full plan. Got 15 findings.

Fix

Fixed all 15 in the plan before writing code.

Build

12 implementation steps, each commit reviewed by MegaLens.

Result

Working service. 350+ tests. Adversarial prompt injection suite.

What MegaLens caught

15 findings total. 13 were additions beyond our own pre-audit. Here are the ones that mattered most.

Critical (2)

CriticalCost cap was decorative

The plan had a daily spending cap but no code to parse actual costs from the API response. The cap was a sign on the wall that nobody reads. Fix: added real cost calculation from every API response.

CriticalFresh install would flood drafts

On database loss, the tool would process every unread email in the inbox, not just recent ones. Dozens of unwanted drafts and a surprise API bill. Fix: added a time-bounded lookback (default: last 24 hours only).

High (7)

High

Email headers passed raw into the AI prompt (injection vector through crafted Subject lines)

High

Draft-create race condition: duplicate drafts on crash between decide and save

High

No MIME parsing spec: how to handle multipart, HTML fallback, charset detection, signature stripping

High

Reply envelope headers undefined: To, Cc, In-Reply-To, References not specified

High

History sync processing wrong event types as new mail

High

Poison messages retry forever with no quarantine or backoff

High

systemd Environment= exposes API keys in /proc and systemctl output

Medium (6)

Medium

Encryption key naming inconsistent across config and code

Medium

Gmail historyId 7-day expiry unhandled (silent data loss after offline period)

Medium

Thread and draft checks burning Gmail API quota (per-message instead of per-cycle)

Medium

No citation verification (closed-world rule enforced by prompt only, not code)

Medium

No precision/recall acceptance criteria for the draft vs. skip decision

Medium

SQLite without WAL mode (database-locked errors under concurrent access)

Safety choices

These aren't features. They're constraints. We deliberately limited what the tool can do.

It can't send email.

Not "it doesn't send." It can't. The Gmail permission it requests (gmail.compose) only allows creating drafts. The send permission (gmail.send) is never requested.

It only answers from your file.

If the answer isn't derivable from your context file, no draft is created. No guessing. No "let me check and get back to you." Just a skip.

Email bodies are untrusted.

The AI is explicitly told to ignore instructions inside email content. Headers are sanitized (control characters stripped, truncated, newlines removed). Structured JSON output limits the attack surface.

Old email is ignored on fresh start.

First run only processes the last 24 hours (configurable). No surprise batch of 200 drafts from last month.

Bad messages stop retrying.

After 3 failures, a message is quarantined. The tool moves on instead of crashing in a loop.

The numbers

15

Findings before code

2

Critical gaps

7

High severity

6

Medium severity

13/15

New to our review

12

Implementation steps

13

Commits

350+

Tests in final build

10+

Prompt injection tests

Honest notes

Our pre-audit was intentionally quick (a few items per category). A more careful self-review would have caught more than 2 of the 15.

Some findings would probably have surfaced during implementation even without MegaLens. Others probably wouldn't have until production.

MegaLens produces findings, not proofs. Every finding required human judgment to assess severity and decide on a fix. This is one build on one plan. Results will vary depending on the project, plan quality, and how thorough your own review is.

Review your build plan before writing code.

MegaLens adds a structured multi-engine review layer to your IDE. Your AI stays the final judge. MegaLens gives it blind-spot coverage.

Try MegaLens Free