Skip to content
Back to Blog
Guide

Vibe Coding Mistakes: 12 Ways AI-Generated Apps Break in Production (and How to Fix Each)

Vibe coding produces great first screens and fragile internals. Here are the 12 failure modes AI-generated apps hit in production — leaked keys, unvalidated tool inputs, silent mutations, package bloat — each with a concrete fix.

11 min read

TL;DR. AI coding agents are excellent at the first screen and careless about everything behind it. The same failure modes show up again and again: secrets in client code, unvalidated tool inputs, silent state mutations, missing setup diagnostics, and bloated packages. Below are 12 of them with a concrete fix for each — drawn from building a production-shaped realtime voice starter, where every one of these had to be solved for real.

"Vibe coding" — describing what you want and letting an AI agent write and run the code — is the fastest way to a working prototype and a surprisingly reliable way to ship something fragile. The model optimizes for a demo that runs now, not a system that survives a real user, a flaky network, or someone poking at your endpoints.

We hit every one of these while building the Voice Agent Starter Kit — a realtime voice app on the Vercel AI SDK. Voice makes the stakes obvious: an open microphone, a cost-bearing token endpoint, and an assistant that can take actions is exactly where "impressive demo, dangerous internals" turns into a real bill or a real data problem. Here's each failure mode and how to fix it, whichever stack you're vibe coding in.

For picking the agent itself, see Claude Code vs Codex; for what the whole stack costs, see Vibe Coding Cost Comparison.

1. A demo UI with no real integration path

The mistake. The sample looks great but never shows where real data, actions, and permissions belong. You get a toy chat widget and still have to invent the production architecture yourself.

The fix. Build the demo inside a realistic app shell, and keep the seams visible. Separate tool/business logic from UI. Give every user-facing behavior a named server route you can later swap for a real action, and document which files to copy into an existing app. A demo that shows the shape of production is worth ten that just look pretty.

2. Secrets leaking into client code

The mistake. The agent drops a provider key into a browser component or a NEXT_PUBLIC_* variable. It works locally and teaches a pattern that leaks your key the moment you deploy.

The fix. Provider keys live only in server routes, server actions, and scripts. The browser calls your endpoint, which uses the key server-side and returns a short-lived token or the result — never the provider directly. Add a rule to your agent prompt: "never put API keys in client code or NEXT_PUBLIC_ vars." For any cost-bearing endpoint (minting tokens, calling a paid API), require an authenticated session, rate-limit per user with a durable store, and check the Origin header — otherwise anyone who finds the URL can run up your bill.

3. Unvalidated AI tool inputs

The mistake. Tool names, arguments, and routes are trusted because "the model produced them." That's a fine way to let a hallucinated tool call touch real customer data.

The fix. Treat model output as untrusted input. Type your tool names and reject unknown ones. Cap request body size and string lengths before anything reaches a handler. Validate any model-produced route, ID, or destination against an allow-list before you act on it. The model proposes; your server decides.

4. Unsafe mutations hidden behind a friendly UI

The mistake. The assistant changes account state directly because the demo wants to feel magical. Then that pattern gets copied into billing, permissions, and email flows.

The fix. Risky actions are drafts first. Anything that mutates state — creating tickets, changing settings, sending mail — surfaces an explicit approval step the user must confirm, and the server enforces that approval actually happened before the write. Voice or chat can prepare a change; it should never silently apply one. Keep an audit trail of what was approved.

5. Missing setup diagnostics

The mistake. The app dies with a vague WebSocket or provider error, and your buyer (or you, in three weeks) burns an hour guessing whether it's an env var, model access, a browser permission, or a Node version.

The fix. Ship a doctor. A tiny page or CLI that checks runtime/Node version, key presence (without printing the secret), selected model, a real round-trip to the provider, and browser permissions turns "it's broken" into "line 3 is red." Make it the first thing you run after install.

6. Faking the hard part

The mistake. A sample implies the live integration works when it's really a mocked transcript behind a "connected" label. Trust evaporates the first time the real SDK edge case appears.

The fix. Make the hard part real, and be honest about the boundary. If the voice/stream/agent is live, say so and don't hide a fake path behind it. Draw a clear line between live behavior and demo data — "the conversation is real; the account data is sample until you wire yours in" — and document any genuine SDK caveat instead of papering over it. Honest boundaries are a feature; pretend-production is a liability.

7. Poor package hygiene

The mistake. The zip (or repo) accidentally includes node_modules, build output, logs, screenshots, generated files, or — worst — a local .env. This one bites paid products especially hard.

The fix. Have an explicit packaging step with an allow/deny list, and a verifier that fails the build if a forbidden file (.env*, logs, build output) is present or a required file is missing. Run it from a clean working directory before every release. Don't trust yourself to remember; make the script remember. (We learned this twice — a first package accidentally bundled the marketing images; the verifier is what catches that class of mistake.)

8. Dependency drift and ignored advisories

The mistake. The starter pins stale packages, skips audit entirely, and ships known-vulnerable transitive deps.

The fix. Commit a lockfile. Make audit a first-class script, not an afterthought. Patch transitive advisories with an override when the direct dependency lags. After any upgrade — especially of fast-moving canary AI SDKs — re-run install, your full verify pipeline, and the audit.

9. Resource leaks and unbounded work

The mistake. The microphone stream stays open after you navigate away. A server script waits forever or buffers unlimited audio. Small demos hide this; real sessions don't.

The fix. Release resources deterministically: stop media tracks on toggle-off and on component unmount. Bound every server operation — cap input length, set a timeout, cap generated output bytes. Anything that can grow without limit will, in production, on someone else's connection.

10. Monolithic, hard-to-modify components

The mistake. The agent puts app data, UI, server calls, and tool logic in one giant file. Nobody can tell what to copy and what to replace.

The fix. One responsibility per file. Sample data in one place, tool definitions and handlers in another, config/allow-lists in a third, UI components separate from all of it. You reason better about code you can hold in your head at once — and so does the next agent you point at it.

11. Inert controls

The mistake. Buttons exist for show. They look interactive but don't update state or exercise a real code path, so the "demo" is a screenshot with hover states.

The fix. Every visible control does something real. A prompt chip sends an actual message. A navigation action actually routes (after validation). An approval card actually gates the result. If a control can't be wired to a real path yet, label it clearly rather than faking it.

12. No AI-coder handoff

The mistake. Your buyer (or teammate) asks Claude Code, Codex, or Copilot to "install this and add X," and the agent makes risky, sprawling edits that quietly delete your guardrails.

The fix. Ship the prompts. Give constrained, copy-paste instructions that tell the agent to inspect before editing, preserve design and approval boundaries, avoid secrets and stray commits, and report what it changed and how it verified. Then review the diff. The handoff is part of the product now.

The ship checklist

Before you call an AI-coded app done, run the boring pass:

  • Grep for provider keys in client code and NEXT_PUBLIC_*. Move any you find server-side and rotate them.
  • Confirm every mutating action has an approval step the server enforces.
  • Cap inputs (body size, string length, session/output limits) and validate model-produced routes/IDs against an allow-list.
  • Run your doctor/diagnostics and your full verify + audit pipeline from a clean checkout.
  • Package with an explicit allow/deny list and a verifier that fails on forbidden files.
  • Release media/streams on unmount; time out and bound every server operation.

Bottom line

Vibe coding isn't the problem — treating the demo as the deliverable is. The gap between a vibe-coded demo and something you can trust in production is a specific, finite list: server-only secrets, bounded inputs, explicit approvals, real diagnostics, package hygiene, dependency audit, and honest boundaries. Fix those twelve and you've closed most of it.

We built every one of these fixes into the Voice Agent Starter Kit — a $19 Next.js realtime-voice starter you can read, run, and adapt, with the guardrails wired in and documented. It lives on the AI Builder Kits hub alongside the coding-tool comparisons.

Related: Claude Code vs Codex · Vibe Coding Cost Comparison · Best AI Coding Stack for Solo Founders

Free · 60 seconds · No credit card

Curious where AI actually fits your job?

Answer a few questions and get a free, personalized 30-day AI plan for your exact role — the tasks to automate first, and the prompts to do it.

Find my AI wins

Frequently asked questions

What is vibe coding?+

Vibe coding is describing what you want in plain language and letting an AI agent — Claude Code, Codex, Cursor, Gemini CLI — write and run the code. It's fast and great for prototypes. The risk is that the generated app looks finished on the first screen while the internals (secrets handling, input validation, error paths) are fragile, because the model optimizes for a working demo, not a production system.

Why do AI-generated apps break in production?+

They break in predictable ways: provider API keys end up in client code, tool and form inputs are trusted because 'the model produced them,' risky actions mutate state with no approval step, setup fails with vague errors, and the repo ships with node_modules, logs, or local secrets. None of these show up in a local demo — they surface the first time a real user, a real network, or a real attacker touches the app.

Is it safe to put an API key in a Next.js app?+

Only in server-only code. Anything prefixed NEXT_PUBLIC_ is shipped to the browser and is readable by anyone. Provider keys (OpenAI, Anthropic, a gateway key) must live in server routes, server actions, or scripts, and the browser should call your own endpoint that uses the key server-side — never the provider directly. If a key ever reached the client, rotate it.

How do I get an AI coding agent to follow my architecture?+

Give it constrained prompts: tell it to inspect the relevant files before editing, preserve existing boundaries (auth, approvals, server-only secrets), avoid broad rewrites, and report what it changed and how it verified. Review the diff before merging. Unbounded 'just make it work' prompts are how agents delete your guardrails.

By Reviewed by Alex LowePublished July 1, 2026

Related Guides

Get weekly AI tips for your profession

Join thousands of professionals saving hours every week with AI. Free. No spam.