The hidden cost of LLM-generated code in enterprise projects — and a deterministic alternative
Three months ago, our team delivered a "production-ready" codebase to a fintech client. It had been generated using a combination of GPT-4 and Copilot — the standard workflow at the time.
The security audit took six weeks and cost more than the original development contract.
This is the story nobody tells about LLM-generated code in enterprise projects. And it's why I now think the real cost isn't in the tokens — it's in what comes after.
The problem isn't capability. It's non-determinism.
Modern LLMs are extraordinarily capable. They can produce a working Flask API, a React dashboard, and a Dockerfile in minutes. I've seen them write code I wouldn't have thought to write myself.
But in enterprise contexts, "working" is only one of six or seven requirements. You also need:
Reproducibility — the same inputs must produce the same outputs, always
- Auditability — every architectural decision must be traceable to a requirement
- Compliance — GDPR, HIPAA, PCI-DSS, SOC2 must be baked in, not bolted on
- Consistency — the data model in the API must match the data model in the docs
- Testability — the code must be structured for unit and integration testing from day one
LLMs fail all five of these in their current form. Not because they're bad — but because they're generative. Each run produces a plausible variation. The same prompt on a different day, or with a slightly different context window, produces different code. That's not a bug, it's the nature of the technology.
For solo projects or prototypes, this is fine. For enterprise codebases that need to pass a SOC2 audit or integrate with a TOGAF-documented architecture, it's a serious problem.
The audit that changed how I think about this
Back to that fintech project. The audit found:
Hardcoded secrets in three environment files that the LLM had generated "as placeholders" — but nobody had replaced them before deployment to staging
2. Inconsistent data models — the LLM had generated a
user_idfield as UUID in the API layer and as integer in the database migrations3. Missing GDPR data residency logic — the generated code had no concept of which data could be processed in which jurisdiction
4. No traceability — when the auditor asked "show me where the PRD requirement for multi-factor authentication is implemented in the code," we couldn't answer
Items 1 and 2 were fixable in hours. Items 3 and 4 cost us six weeks.
The problem wasn't that the LLM wrote bad code. The problem was that the LLM had no model of the architecture — it was generating code from context, not from a formal specification.
What "model-driven" actually means
There's a discipline called Model-Driven Architecture (MDA), developed by the Object Management Group in the early 2000s. The core idea: you define your system formally at a high level of abstraction, and then transform that model into platform-specific code using deterministic rules.
The transformation is key. Unlike generation, transformation is:
Deterministic — the same model always produces the same code
- Traceable — every line of generated code can be traced to a model element, which can be traced to a requirement
- Composable — compliance rules, security patterns, and infrastructure configs are overlays that get applied to the model, not afterthoughts
ArchiMate 3.2, the enterprise architecture notation developed by The Open Group, is the best-suited modelling language for this. It covers every layer of an enterprise system — from business process down to physical infrastructure — in a single coherent notation.
What this looks like in practice with Archiet
Archiet is a platform built on exactly these principles. You provide a Product Requirements Document (PRD) and an ArchiMate 3.2 model, and Archiet's M2T (Model-to-Text) engine transforms them into production-ready application code — as a downloadable ZIP.
The same architecture model always produces the same output. Run it today, run it in six months: identical structure, identical compliance overlays, identical documentation.
Here's what gets generated:
12 technology stacks — Flask/Next.js, FastAPI, Django, NestJS, Laravel, Go, Java Spring Boot, Rails, .NET, Salesforce Apex, SAP CAP, Tauri+Rust. You pick the stack; the architecture model drives the generation.
9 compliance overlays — GDPR, HIPAA, PCI-DSS, SOC2, DORA, EU AI Act, ISO 13485, IEC 62443, Building Safety Act 2022. These are applied deterministically to the architecture model, not added by an LLM as an afterthought.
Fortune-500-grade documentation — ArchiMate diagrams, Architecture Decision Records, TOGAF artifacts, and a full traceability matrix. When the auditor asks "show me where MFA is implemented," you show them the model element and the generated code side-by-side.
Quality gates — every generated ZIP must score ≥80/100 before delivery. Any hardcoded secret or placeholder hard-blocks the release. That fintech audit issue? Not possible with Archiet.
E2B boot testing — generated applications are actually booted and tested in an isolated environment before you download them.
The audit argument for formal specification
The most underrated benefit of model-driven generation is what happens during an audit.
When your codebase is generated from a formal architecture model, every element is traceable. The auditor asks about GDPR data residency: you point to the compliance overlay in the model. They ask about access control: you point to the ArchiMate business layer. They ask about the database schema: it's derived directly from the data entities in the model, so there's no drift.
This traceability is simply not possible with LLM-generated code, unless you spend enormous effort retro-fitting documentation — which is exactly the kind of cleanup that erases the productivity gains.
Who this is actually for
I want to be clear about the trade-off. Archiet is not for prototypes or weekend projects. If you're building a quick demo to validate a hypothesis, use Bolt or v0. They're excellent tools for that use case.
Archiet is for teams where:
The architecture has been designed and documented before a line of code is written
- Compliance is a first-class requirement, not a checkbox at the end
- The codebase needs to be auditable and reproducible six, twelve, twenty-four months later
- You're working across multiple technology stacks and need consistency between them
For software development agencies taking on enterprise clients, for system integrators in fintech and healthtech, and for enterprise architects who need their documented architecture to actually become the software — this is the tool that closes the gap.
The bottom line
LLMs are transformative for code assistance. I use them daily for autocomplete, for explaining unfamiliar APIs, for writing tests.
But for enterprise code generation — where reproducibility, traceability, and compliance are non-negotiable — the non-deterministic nature of LLMs creates risks that compound over time. The productivity gains at the start get paid back in audit debt later.
Model-driven generation is deterministic by design. The architecture is the source of truth. The code is a transformation of that truth, not a prediction of it.
If that distinction matters in your context, [Archiet](https://archiet.com) is worth a look. There's a free tier to start, and the pricing page at [archiet.com/pricing](https://archiet.com/pricing) covers the full range from solo developers to enterprise teams.
