Originally published at mnemehq.com
Every major technology shift produces a new stack. The database era gave us a standard layering of application, ORM, query engine, and storage. The cloud era gave us a standard layering of compute, orchestration, networking, and persistence. Each layer had clear responsibilities, clear interfaces, and a growing ecosystem of tooling at each level.
Generative AI is producing a new stack for software engineering. It is less than three years old, is still being argued about, and has several layers that are genuinely unresolved. But the shape is visible, and the teams building in this space need to understand it — both to make good tooling decisions and to identify where the real unsolved problems are.
This article maps the stack as it exists today.
Layer 1: Foundation models
The base layer is the models themselves. GPT-4o, Claude 3.5 Sonnet, Gemini 1.5 Pro, Llama 3, Mistral, and the growing set of code-specialized variants. This layer is commoditizing rapidly. Model capability is compressing: a model released eighteen months ago at frontier quality is now matched by models that cost a tenth as much to run. The differentiation at this layer is moving from raw capability toward specialization, latency, cost, and fine-tuning surface.
Key properties of this layer: probabilistic output, context-window-constrained, stateless across calls. Every layer above Layer 1 exists in part to compensate for one of these three properties.
Layer 2: Developer tooling
The second layer is the interface through which engineers interact with the models: IDEs, editor extensions, chat interfaces, and terminal tools. Cursor, GitHub Copilot, Codeium, Sourcegraph Cody, JetBrains AI Assistant. This layer handles the user-facing experience: accepting input, formatting it into model requests, rendering output, and managing the basic interaction loop.
This layer has seen the most visible competition and the most rapid adoption. It is also the layer most teams conflate with “AI coding” as a category, which creates the false impression that choosing a good editor extension is equivalent to having a complete AI engineering strategy. It is not.
Layer 3: Context management
The third layer manages what goes into the context window: which files, which symbols, which documentation, which conversation history. This is a hard problem because the context window is finite, the relevant information is scattered across a large codebase, and the model’s performance degrades as the window fills with irrelevant content.
Solutions at this layer include: RAG over code indexes, tree-sitter-based symbol extraction, semantic chunking, conversation summarization, and embedding-based retrieval of relevant past outputs.
Context management is necessary but not sufficient for architectural consistency. Surfacing relevant code is not the same as enforcing architectural constraints.
Layer 4: Memory
The fourth layer extends the context across sessions: durable storage of preferences, past decisions, project-specific conventions, and prior outputs. Without this layer, every new conversation starts cold, and the engineer must re-establish context that should persist automatically.
Memory at this layer is typically implemented via embedding stores and retrieval. Claude’s Projects feature, Cursor’s user rules, and various agent frameworks’ memory modules operate at this layer.
Memory optimizes for recall. Given a query, return what is relevant from the past. This is an important and genuinely hard problem. It is also not governance.
Layer 5: Governance
The fifth layer is the least mature and the most strategically important. Governance is the system that represents architectural decisions as structured objects, resolves conflicts between them deterministically, and enforces the resolved constraints at code generation and review time.
Where memory asks “what have we seen before that is relevant?”, governance asks “what rule applies here, and was the generated output compliant with it?”
The gap between these questions is large. Governance requires:
A structured representation of decisions (not just text)
Scope semantics (this rule applies to this service, not that one)
Precedence resolution (when two rules conflict, which wins)
An enforcement point (a hard boundary, not a suggestion)
An audit surface (what rule was applied, why, to which output)
None of these exist in any meaningful form in the current tooling landscape. Teams cobble together CLAUDE.md files, custom lint rules, and review checklists — all of which are enforcement by convention rather than enforcement by infrastructure.
This is the layer where the most important unsolved problem in AI-assisted engineering sits. The teams that build durable governance infrastructure at Layer 5 will have a structural advantage over the teams that do not, because their AI-generated codebases will remain coherent as they scale. The teams without it will face compounding architectural drift that becomes expensive to fix.
Layer 6: Orchestration
The sixth layer coordinates multi-step, multi-model workflows: autonomous agents that plan, execute, observe results, and iterate. LangChain, LlamaIndex, AutoGen, CrewAI, and the growing set of agent frameworks live here. This layer is responsible for breaking large tasks into subtasks, routing between models and tools, managing execution state, and handling failure modes.
Orchestration amplifies everything below it — both capability and risk. An orchestrator that runs on top of a governance layer can generate and validate large amounts of code while staying within architectural constraints. An orchestrator that runs without a governance layer generates large amounts of code with no constraint enforcement, and the drift compounds across every step in the workflow.
Layer 7: Human oversight
The seventh layer is the human review and decision loop: code review, architecture review, incident response, and the organizational processes that surround them. This layer is not going away. Its role is changing.
The shift is from line-by-line verification (which cannot scale with AI output volume) to policy definition and exception handling. Humans define the architectural decisions at Layer 5. Humans review the audit trail of which decisions were applied and where. Humans handle the cases that fall outside the governance model. The governance layer makes human oversight viable at AI-generation scale by compressing what humans need to check from “every line of code” to “the policy decisions and their exceptions.”
Where teams are actually operating
Most teams building with AI today have strong Layer 1 and Layer 2 investment, reasonable Layer 3 investment, weak Layer 4, almost no Layer 5, and a Layer 6 that is growing fast. The gap between Layer 4 (memory) and Layer 6 (orchestration) — the missing Layer 5 — is where most AI-assisted engineering teams are operating blind.
The result is what you would expect from a machine running without the middle of its control plane: impressive velocity, accumulating drift, and an audit surface that cannot explain what the system actually did.
The next eighteen months of the AI engineering stack will be defined primarily by what gets built at Layer 5. The teams that solve governance will set the standard for what it means to build AI-assisted software at scale.
Read this article in full at mnemehq.com/insights/generative-ai-software-engineering-stack/


