Skip to main content
Engineering/adr-generate

ADR Generate

You need to document an architecture decision with context, alternatives, trade-offs, and consequences in standard ADR format.

Use this when the team has made (or needs to make) an architecture decision and wants to document it in a durable, findable format. ADRs capture the context, alternatives considered, rationale, and consequences so future engineers understand why the system is built the way it is -- not just how.

Related skills: Use after /architecture-discovery to document decisions that emerge from SNAP analysis. Complements /architecture-context-reviewer, which retrieves existing ADRs -- this skill creates new ones. Reference /system-diagram to provide visual context for the decision.

Process

Step 1: Gather decision context

Ask the user to provide:

  1. What decision needs to be recorded? -- a technology choice, an architecture pattern, a boundary definition, a data model change, an integration approach.
  2. What triggered this decision? -- new feature requirement, scaling issue, incident, technology end-of-life, team growth, compliance requirement, performance bottleneck.
  3. Who are the deciders? -- who made or will make this decision? (Names and roles.)
  4. What's the deadline? -- is this already decided, or does the team need to decide by a certain date?
  5. Any prior discussion? -- links to Slack threads, meeting notes, RFCs, design docs, or PR comments where this was debated.

Step 2: Capture current state and constraints

Document what exists today and what constrains the decision:

  • Current architecture -- what does the system look like now in the area this decision affects?
  • Technical constraints -- language, framework, infrastructure, or platform limitations.
  • Business constraints -- timeline, budget, compliance, team capacity.
  • Non-negotiables -- requirements that any option must satisfy (e.g., "must support 10x current traffic," "must be HIPAA-compliant," "must not require downtime").

Step 3: Enumerate alternatives

Document at least 2 alternatives (3 is ideal). For each:

DimensionWhat to capture
DescriptionWhat is this option, concretely?
ProsWhat does it do well?
ConsWhat are the downsides?
EffortHow much work to implement? (T-shirt size: S/M/L/XL)
RiskWhat could go wrong?
ReversibilityHow hard is it to undo this choice later? (Easy / Hard / Irreversible)

If an alternative was already rejected before this ADR, still document it with the reason -- this prevents future engineers from re-proposing the same idea.

Step 4: Document the decision

State the chosen option with rationale that ties directly to the trade-off analysis:

  • Which decision drivers did the chosen option satisfy best?
  • What trade-offs were accepted?
  • What was the deciding factor between the top contenders?

The rationale should be specific enough that someone reading this ADR in 2 years can understand why this option was chosen over the alternatives.

Step 5: Define consequences and follow-up

Categorize the consequences of this decision:

  • Positive -- what improves or becomes possible?
  • Negative -- what trade-offs are accepted? What becomes harder?
  • Neutral -- what changes without being clearly better or worse?

Then define:

  • Follow-up actions -- concrete tasks that need to happen as a result of this decision (with owners and deadlines).
  • Revisit triggers -- conditions that should prompt re-evaluation of this decision (e.g., "if traffic exceeds 10K RPS," "if the team grows beyond 8 engineers," "when the vendor contract renews in Q3 2027").

Step 6: Generate the ADR

Output using the Michael Nygard ADR template (the industry standard):


ADR-{{NNN}}: {{decision-title}}

Date: {{date}} Deciders: {{names-and-roles}} Context source: {{links-to-prior-discussion}}

Context

{{What is the issue that motivates this decision? What forces are at play -- technical, business, team, timeline? Be specific about the situation, not generic.}}

Decision drivers

  • {{Driver 1 -- e.g., "Must handle 10x current traffic without re-architecting"}}
  • {{Driver 2 -- e.g., "Team has deep experience with PostgreSQL but not Cassandra"}}
  • {{Driver 3 -- e.g., "Compliance requires data residency in EU region"}}

Considered alternatives

Option A: {{name}}

  • Pros: {{specific advantages}}
  • Cons: {{specific disadvantages}}
  • Effort: {{S/M/L/XL}}
  • Risk: {{what could go wrong}}
  • Reversibility: {{Easy / Hard / Irreversible}}

Option B: {{name}}

  • Pros: {{specific advantages}}
  • Cons: {{specific disadvantages}}
  • Effort: {{S/M/L/XL}}
  • Risk: {{what could go wrong}}
  • Reversibility: {{Easy / Hard / Irreversible}}

Option C: {{name}} (if applicable)

  • Pros: / Cons: / Effort: / Risk: / Reversibility:

Decision

We will use {{chosen option}} because {{rationale tied directly to decision drivers and trade-off analysis}}.

Consequences

Positive:

  • {{what improves}}

Negative:

  • {{what trade-offs are accepted}}

Neutral:

  • {{what changes without clear valence}}

Follow-up actions

ActionOwnerDeadline
{{specific task}}{{person-or-role}}{{date}}

Revisit triggers

  • {{Condition that should trigger re-evaluation of this decision}}
  • {{e.g., "If latency exceeds 200ms p99 under the new architecture"}}

Step 7: Discuss

Ask the user:

  • Does the context section capture the full picture?
  • Are there alternatives I should add or remove?
  • Is the rationale clear to someone who wasn't in the room?
  • What ADR number should this be? (Check existing ADR numbering convention.)
  • Where should this ADR be stored? (Repo docs folder, wiki, Notion, etc.)

Output location

Present the ADR as formatted text in the conversation or write to a file if the user specifies a path and naming convention.

Example Output

Input

  • Decision to record: Switch from REST polling to WebSockets for real-time order status updates in the customer-facing order tracking page
  • Company / team: Dispatch Tiger, a last-mile delivery SaaS — Platform Engineering team (6 engineers)
  • Trigger: Customer complaints about stale tracking data; current 10-second polling is generating 40% of all API traffic and causing p95 latency spikes during peak lunch/dinner windows
  • Deciders: Priya Nair (Staff Engineer), Marcus Delgado (Engineering Manager), Sofia Reyes (Product Lead)
  • Prior discussion: Slack thread #eng-platform-realtime (Nov 4), RFC doc in Notion titled "Order Tracking Latency RFC v2" (Nov 7)

Output (abbreviated)

ADR-019: Real-Time Order Tracking via WebSockets

Date: 2024-11-12 Deciders: Priya Nair (Staff Engineer), Marcus Delgado (Engineering Manager), Sofia Reyes (Product Lead) Context source: Slack #eng-platform-realtime (Nov 4, 2024); Order Tracking Latency RFC v2 (Nov 7, 2024)


Context

The order tracking page at Dispatch Tiger currently fetches order status via REST polling on a 10-second interval. As the customer base has grown to ~85K concurrent sessions during peak delivery windows (12–1 pm and 6–8 pm), this polling pattern now accounts for 40% of total inbound API traffic. Load testing shows p95 latency degrading to 1,800 ms during these windows — well above our 400 ms SLA. Customers are filing support tickets about seeing stale statuses ("Out for delivery" when the package has already arrived).

The backend already emits order state change events on an internal Kafka topic (order.state.changed). The gap is in how those events reach the browser.

Decision drivers

  • Must reduce polling-generated API traffic by at least 50% without degrading perceived update freshness
  • Must push status changes to the browser within 2 seconds of the Kafka event
  • Must work within existing Node.js / Express backend and React frontend — no full-stack rewrite
  • Must gracefully degrade for customers on flaky mobile connections (no silent data loss)
  • Must not require a browser extension or native app change (web-only scope)

Considered alternatives

Option A: WebSockets (via Socket.IO)

  • Pros: Persistent bidirectional connection eliminates polling; sub-500ms push latency achievable; Socket.IO handles reconnection and fallback to long-polling automatically; strong team familiarity
  • Cons: Stateful connections increase infrastructure complexity; requires sticky sessions or a shared adapter (Redis pub/sub) for horizontal scaling; connection count limits need capacity planning
  • Effort: M
  • Risk: Redis adapter becomes a single point of failure if misconfigured; connection storms on deploy restarts
  • Reversibility: Hard — client code must be refactored back to REST polling if reversed

Option B: Server-Sent Events (SSE)

  • Pros: Unidirectional (server→client), which matches the use case; simpler than WebSockets; HTTP/2 multiplexing means no sticky session requirement; works through most corporate proxies
  • Cons: No native browser reconnect backoff (must implement manually); some older mobile browsers have poor SSE support; limited to text/UTF-8 payloads
  • Effort: S
  • Risk: Proxy and load balancer timeouts silently dropping streams in customer enterprise environments; lower engineering familiarity
  • Reversibility: Hard

Option C: Continue REST polling with adaptive interval

  • Pros: Zero infrastructure change; well-understood failure modes
  • Cons: Does not solve the root traffic problem — adaptive intervals only reduce load ~15% in simulations; latency improvement minimal; rejected as insufficient
  • Effort: S
  • Risk: Traffic problem resurfaces within one product cycle as user growth continues
  • Reversibility: Easy (pre-rejected)

Decision

We will use Option A: WebSockets via Socket.IO because it satisfies the sub-2-second push latency driver and the graceful-degradation requirement simultaneously. Socket.IO's automatic fallback to long-polling ensures customers on flaky mobile connections do not silently miss updates — a gap that SSE (Option B) would require significant custom code to close. The team's existing Socket.IO experience (used in the driver dispatch console) reduces implementation risk and shortens ramp time. The Redis adapter complexity is accepted as a known, manageable trade-off given that Redis is already in the Dispatch Tiger stack.

SSE was the close second; we will revisit it if Socket.IO's stateful connection model proves operationally burdensome at the next traffic tier.


Consequences

Positive:

  • Polling traffic eliminated for active order tracking sessions; projected 35–40% reduction in total API request volume
  • Customer-visible update latency drops from ~10 seconds (polling average) to < 1 second (p95 target)
  • Backend Kafka consumer already exists; WebSocket layer is a thin fan-out bridge

Negative:

  • Infrastructure now requires Redis pub/sub adapter for Socket.IO — adds an operational dependency
  • Load balancer must be configured for sticky sessions (or adapter must be validated under multi-node failover)
  • Connection count capacity must be modeled and monitored; currently untracked metric

Neutral:

  • REST /orders/:id/status endpoint remains available for non-browser clients (mobile apps, third-party integrations) — no change to those consumers
  • Frontend polling logic removed; net reduction in client-side code complexity

Follow-up actions

ActionOwnerDeadline
Provision Redis pub/sub adapter in staging and validate multi-node failoverPriya NairNov 22, 2024
Update load balancer config for WebSocket upgrades + sticky session validationMarcus Delgado (infra coordination)Nov 22, 2024
Implement Socket.IO server bridge from order.state.changed Kafka topicBackend squad (lead: Dario Kim)Nov 29, 2024
Update React order tracking component to Socket.IO client; remove pollingFrontend squad (lead: Anika Wren)Nov 29, 2024
Define and instrument connection-count dashboard in DatadogPriya NairDec 6, 2024
Load test WebSocket layer at 100K simulated concurrent connectionsQA (lead: Tomás Vera)Dec 10, 2024

Revisit triggers

  • If concurrent WebSocket connections exceed 150K (current Redis adapter capacity ceiling) — evaluate Socket.IO cluster sharding or migration to SSE under HTTP/2
  • If Redis adapter availability falls below 99.9% over any 30-day window — evaluate decoupling the fan-out layer
  • If team grows beyond 10 engineers and a dedicated infrastructure team forms — re-evaluate operational ownership of the Redis dependency
  • When AWS ElastiCache contract renews (Q2 2026) — reassess managed vs. self-hosted Redis trade-off