Playbook

AI-Assisted Dev Adoption Loop

How a team moves from "we tried Copilot" to a reproducible AI-assisted development practice without losing engineering judgment.

7 min readFor: Engineering leaders and senior ICs introducing AI assistants to a team that has not standardized yet.Reviewed Apr 2026 · 1 month ago

Explicit Control Loops One Config, Many Surfaces

TL;DR

Do not roll out AI assistants to a team by issuing licenses and hoping. Roll out a loop: repo-level instructions, small gates, visible state, and a feedback channel.
The first thing to standardize is where the instructions live (AGENTS.md / CLAUDE.md), not which assistant people use. Assistants come and go. Instruction hierarchy persists.
Bound the blast radius: no direct writes to main, hooks enforce scope, tests and lint are non-negotiable gates, and anything destructive goes through a human.
Track adoption by the kinds of work the team moves to assistants, not by tokens used or PRs opened. The failure mode is opaque productivity theater.
Give engineers a clear signal for when to not use the assistant. Without that, borrowed competence silently erodes judgment.

When to use this playbook

You are running an engineering team that has tried an AI coding assistant but has not converged. Symptoms:

Different people use different tools with different results, and nobody owns the gap.
The assistant works well on throwaway code but produces churn in the parts of the codebase you care about.
Reviews have gotten harder because PRs look plausible but are subtly wrong.
Leadership wants a number, and you do not trust the number you would give them.

The goal is not "everyone uses the assistant more." The goal is a repeatable, inspectable practice you can measure, defend, and improve.

Inputs

Before you start, confirm you have:

A canonical repo or small set of canonical repos the team actually ships from.
Shared ownership of CI — someone who can add hooks, gates, and checks without needing a ticket.
At least one senior engineer willing to act as the practice owner for the first 4–6 weeks.
A commitment from leadership that "the assistant built it" is never an acceptable answer to an incident review.

If any of those are missing, stabilize them first. Rolling out tooling on top of an unstable foundation inherits the instability and blames the tool.

The loop

The adoption loop has four phases. Do not skip ahead.

Instruction hierarchy — write down what the assistants should know about this repo.
Gate the work — make CI, hooks, and review invariants enforce what you actually care about.
Standardize the surface — pick one assistant as the reference, document how to use it, and treat the config as code.
Measure the shift — track which kinds of work move to the assistant, and which should not.

Each phase has a clear done-state. Do not move to the next phase until the previous one is durable.

Phase 1: Instruction hierarchy

The highest-leverage artifact in AI-assisted development is an AGENTS.md file (or CLAUDE.md, or the equivalent for your assistant). It sits at the root of the repo and tells any assistant that opens it: what the repo is for, what conventions apply, what not to do, and where to find the details.

A minimum useful instruction hierarchy has four layers:

Repo-level AGENTS.md — architecture, conventions, branch naming, what to ask before doing.
Subtree-level AGENTS.md — per-service or per-library overrides (testing stack, language quirks, dependency pins).
Skills / commands — short, named procedures the assistant can invoke (for example, a commit or refactor skill).
Personal overrides — a gitignored AGENTS.local.md so individual engineers can add preferences without polluting the shared config.

What goes in these files is specific, not motivational. "Our tests run with make test; do not use npm test." "Kubernetes manifests live in platform/gitops; never kubectl apply directly." "When fixing a bug, add a regression test or say why you did not."

The done-state for this phase: a new engineer or a fresh assistant session can open the repo, read AGENTS.md, and know the rules. You verify by running an assistant session and asking it to describe the repo. If it describes the repo correctly, the file is doing its job.

Phase 2: Gate the work

The assistant will generate plausible-looking code. That is what assistants do. Your gates decide what makes it into the codebase.

At minimum, enforce:

No direct commits to main. Every change goes through a PR with review. This is table stakes, but it is easy to let assistant-speed commits slip through.
Lint + format pre-commit hooks. The assistant should never be the reason formatting drifts.
Typecheck and test gates in CI. Assistants can happily write code that does not compile. Compilation is the cheapest honest signal.
A "changes touch what was asked" check in review. Assistants love to helpfully refactor neighboring code. You want those changes deliberate, not opportunistic.
Secret scanning. Assistants occasionally paste secrets from one context into another.

Harder but higher-leverage gates:

Contract tests for anything the assistant touches in integration code. Schema checks turn "looks right" into "is right."
Invariant tests for critical business logic (patient matching, payment routing, authorization). These exist regardless of who wrote the code, but the assistant makes their absence more dangerous.
Review checklists that include: "is this change in scope," "are tests updated," and "is the commit message the kind we want."

Done-state for Phase 2: a deliberate attempt to introduce a plausible-but-wrong change via the assistant is caught by gates, not by a human reading carefully.

Phase 3: Standardize the surface

At this point your team has instructions and gates. Now standardize the assistant.

Pick one primary assistant and commit. It does not matter which — Claude, Cursor, Copilot, Codex, Gemini, something else. What matters is that the team converges so you can build shared muscle memory, shared skills, and shared troubleshooting.

If your team legitimately needs multiple assistants (different languages, different IDEs), treat assistant configuration as code. Keep the MCP / tool / skill registry in a single source, generate per-assistant configs from it, and check them all in. One place to change a policy; many places to propagate it. This is the same "one config, many surfaces" pattern that works for integration configs and GitOps overlays.

The specific artifacts to standardize:

MCP servers (or the equivalent tool layer) — which external systems the assistant can talk to.
Skills / slash commands — named procedures for recurring work (commit, review, investigate, test, deploy).
Hooks — what runs before and after tool calls, so the assistant cannot quietly skip CI or sign off without verification.
Memory — where decisions and context persist across sessions, so the next session is not starting from zero.

Done-state for Phase 3: any engineer on the team can pair with any other engineer's assistant session and recognize what is happening. The assistant's behavior is not a personal preference; it is a team decision.

Phase 4: Measure the shift

The wrong metrics for AI adoption are easy: lines of code, PRs merged, tokens spent. The right metrics are about which kinds of work are moving, and whether engineers are getting stronger or weaker.

Track:

Categories of work the team now routes to the assistant by default. Scaffolding new services. Writing tests for existing code. Migrating dependencies. Generating boilerplate. Researching a library. These should grow over time.
Categories of work the team explicitly keeps human. Data model changes in the critical path. Anything touching PII handling. Incident response beyond triage. Final security reviews. Any architectural decision that will be hard to reverse.
Time-to-review and revert rate. If either is getting worse, the assistant is creating more churn than it removes.
Engineer judgment. This is softer, but essential. Ask senior engineers: "could you have done this without the assistant, and would you recognize if it were wrong?" When the answer drifts toward no, you are developing a dependency rather than a tool.

Done-state for Phase 4: you can answer leadership's "how is AI going on your team" question in under two minutes, with specifics, and defend the answer under follow-up.

Gates summary

Each phase has a gate. Do not move on until it passes.

Phase	Gate	Evidence
1. Instructions	A fresh session understands the repo from AGENTS.md	Run it; verify output
2. Gates	Plausible-wrong changes are caught automatically	Red-team test
3. Surface	Team can pair across sessions without confusion	Observed behavior in retro
4. Measure	Can answer "how is AI going" with specifics	The number and the defense

Anti-patterns

These will kill the loop.

Tool-first rollout. "Everyone has the license now" is not a strategy. Licenses are the cheap part.
No instruction file. Every session starts from zero, every engineer has a different experience, nothing compounds.
Gates added later. Once the team is used to moving fast without gates, adding them feels punitive. Add them first; move fast inside them.
"Assistant did it" as a commit justification. Every change belongs to a human reviewer. The assistant is not a co-author for blame purposes.
Ignoring the judgment erosion question. If senior engineers cannot tell when the assistant is wrong, you are six months from an incident that nobody on the team can debug.

What this looks like in practice

In the repos I work in, the adoption loop produced:

A canonical AGENTS.md at the workspace root, with per-service AGENTS.md overrides.
A central MCP registry that generates configs for six assistants, so switching is a diff, not a migration.
Hooks that enforce session tracking, test runs, and lint pre-commit, with non-skippable CI gates after.
A set of named skills (commit, review, investigate, plan) that encode the team's actual process.
A memory system that records decisions and outcomes so the next session inherits the context.

None of that is complicated on its own. The discipline is treating all of it as shared infrastructure, not personal tooling.

Source material

The evidence this playbook draws on is linked at the bottom of the page. The short version: AI-assisted development that holds up in production looks like any other reliable system — an explicit loop, bounded retries, visible state, and enforcement from things that are not opinions.

Source material

The evidence this playbook draws on.

Back to principles Talk about adoption