Case Study #2
23 issues found before writing a single line of code.
We had a 7-step UI plan for a complex web app, and on first read it looked solid. Before building, we sent it to two independent reviewers. They found 23 issues across security, architecture, UX, and performance, and the plan changed materially before implementation started.
23
Issues found
3
Security risks
15
Plan modifications
77%
Agreement rate
Specific engine names and their skill-specific combinations are proprietary to MegaLens. Reviewers are referred to by role to protect our intellectual property.
Reviewer 1: the architecture gaps
Looked at the plan like a senior frontend architect and found 10 structural gaps.
No data model or state management strategy defined
No loading, error, or retry behaviour specified
No rules for what persists across sessions vs. what's ephemeral
No URL routing strategy for conversation history
No accessibility considerations
No strategy for rendering rich content safely
No mechanism to cancel or stop in-progress generation
No way to retry individual components of a multi-engine response
No empty states or first-use onboarding
No frontend test strategy
The pattern:The plan covered screens and layouts, but not enough of the operating model. It answered “what should this look like” better than “what happens when this gets messy.”
Reviewer 2: security, UX, and production risk
Took a broader brief and came back with 13 issues across four categories.
Security (3)
Client-side credential storage exposed to cross-site scripting
Rich content rendering could execute injected scripts
Credential validation requests at scale could trigger provider rate limits
UX (4)
Requiring credentials before first interaction kills conversion
Simulated typing effects frustrate experienced users
Cost estimates for multi-engine queries can't be accurate — showing false precision erodes trust
Multi-panel comparison views don't work on mobile screens
Performance (3)
Expandable detail views create excessive DOM nodes at scale
Shared application bundle on the marketing page hurts load time and SEO
Credential validation on every paste creates unnecessary API load
Production (3)
Platform execution time limits conflict with multi-engine debate duration
Long-lived streaming connections don't scale without connection management
File-based context (upload, paste) missing entirely — limits usefulness for real work
Cross-examination: where they diverged
Reviewer 2's 13 criticisms were sent to Reviewer 1 for validation.
| Category | Agreed | Disagreed | Partial |
|---|---|---|---|
| Security | 3/3 | 0 | 0 |
| UX | 4/4 | 0 | 0 |
| Performance | 1/3 | 0 | 2 |
| Production | 1/3 | 2 | 0 |
10 of 13 confirmed (77%). The 3 disagreements were about timing, not validity. Reviewer 2 was right that the risks were real. Reviewer 1 was right that they were not launch blockers.
How the plan changed
The original 7-step plan picked up 15 modifications before any code was written.
Security changes
- •Credentials stored in runtime memory only — never in browser storage
- •Rich content sanitised before rendering — no raw HTML execution
- •Credential validation deferred to first use, not on input
UX changes
- •Credential input moved from blocking modal to inline prompt
- •Simulated typing removed — real streaming progress only
- •Cost display changed from exact estimates to honest ranges
- •Mobile layout changed from side-by-side panels to stacked cards
Architecture additions (not in original plan)
- •Typed event schema for streaming responses
- •Cancel/stop mechanism for in-progress generation
- •Per-component retry capability
- •First-use onboarding and empty states
- •Partial failure display (when some engines succeed and others don't)
Deferred intentionally (V2)
- •Connection pooling at scale
- •File upload and document context
- •Full DOM virtualisation
The Numbers
23
Issues found before implementation
3
Security risks
4
UX anti-patterns
10
Architecture gaps
3
Performance risks
3
Production risks
77%
Cross-examination agreement rate
3
Items deferred after disagreement
15
Plan modifications applied
Why this mattered before build
Plan review catches a different class of mistake than code review.
Code review finds implementation bugs. Plan review finds missing capabilities, weak assumptions, and architectural decisions that become expensive to reverse once code exists.
Independent reviewers look at the same plan and see different failures.
Reviewer 1 found structural omissions. Reviewer 2 found security, UX, and production risk. There was very little overlap, which is exactly the point.
Disagreement is useful input, not noise.
The 3 disagreements were prioritisation debates. Both reviewers agreed the risks were real; they disagreed about when to deal with them. That is better decision support than a clean but shallow consensus.
Reviewing before implementation is cheaper than rebuilding after it.
Every one of these 23 issues would have been more expensive to fix after code existed. The credential storage change alone, from browser storage to runtime memory, would have touched every component that handles credentials.
Limitations
This was one UI plan reviewed by two AI reviewers. The issues were real and the plan changes were applied, but we are not treating the ratios here as universal.
Some of the 23 “issues” were omissions, such as no cancel button, rather than defects in existing code. We counted them because they were real gaps that would have shipped, but a stricter definition of “issue” would lower the total.
AI reviewers can over-index on theoretical risk. The connection-scaling and file-upload critiques were technically valid but practically early. Human judgment still decided what mattered for launch and what belonged in V2.
Implementation details, tool names, and architecture specifics are intentionally omitted. We share the process and the numbers, not the blueprint.
Try MegaLens Free