Architecture Discovery - AI Agent Skill

Use this when you need to go from "we don't understand this system" to "we have identified services, APIs, data ownership, and an actionable backlog." This skill set chains three stages: Event Storming (discover events and boundaries), Boris Modeling (model service relationships), and SNAP Documentation (capture detailed architecture per bounded context).

This is a skill set -- it orchestrates /event-storm, /boris-model, and /snap-document in sequence. Each skill also works independently.

Process

Step 1: Gather inputs

Ask the user:

Business domain or system -- what are we discovering? (e.g., "order management," "patient records," "payments platform")
Goal -- what's driving this? (modernization, new team onboarding, monolith decomposition, greenfield design)
Scope -- full system or a specific subsystem/process?
Participants -- who will be involved across all three stages?
Current state -- any existing diagrams, docs, or architecture artifacts?
Session format -- in-person workshops or async/virtual documentation?
Output destinations -- where should final deliverables land? (Miro, Notion, Linear, FigJam, files)
How deep? Options:
- Full discovery (default) -- all three stages, complete SNAP sheets, consolidated backlog
- Events + Boris only -- skip SNAP, stop after service relationships are mapped
- Events only -- just Event Storming, stop after domain events are captured

Step 2: Event Storming

Run /event-storm with the inputs gathered in Step 1.

Key outputs to carry forward:

Domain events (timeline)
Bounded contexts (with key events and ubiquitous language)
Thin slices (happy path + alternates)
Hot spots and pain points
Communication patterns (sync vs. async)

Step 3: Bridge -- Event Storming to Boris

Before moving to Boris, confirm with the user:

"We identified (N) bounded contexts: (list). Do these feel right?"
"The happy path thin slice is: (path). Should we start Boris with this flow?"
"There are (N) hot spots. Any that would block Boris modeling?"

Reshape the Event Storming output for Boris:

Each bounded context becomes a service node
Each communication pattern becomes a candidate interaction arrow
Each thin slice becomes a flow to walk through
Hot spots carry forward as open questions

Step 4: Boris Modeling

Run /boris-model using the reshaped inputs from Step 3.

Key outputs to carry forward:

Services (with responsibilities and event ownership)
Service interactions (sync/async, labeled)
Flows traced through the architecture
API candidates
Patterns identified (orchestration, choreography, shared data)
Open questions

Step 5: Bridge -- Boris to SNAP

Before moving to SNAP, confirm with the user:

"We identified (N) services with (N) interactions. Ready to document details?"
"The API candidates are: (list). Do these cover the key integration points?"
"Open questions from Boris: (list). Any to resolve before SNAP?"

Reshape the Boris output for SNAP:

Each service becomes a SNAP sheet
Each API candidate seeds the APIs category
Each event (produced/consumed) seeds the Pub/Sub category
Each interaction seeds the External Systems category
Open questions become Risks and Stories (investigation spikes)

Step 6: SNAP Documentation

Run /snap-document using the reshaped inputs from Step 5.

Key final outputs:

Complete SNAP sheets per bounded context
Gap analysis (missing APIs, data ownership conflicts, orphan events)
Consolidated backlog with prioritization

Step 7: Synthesize final deliverables

Produce a single architecture discovery summary:

Architecture Discovery: (Domain/System Name)

Date: (date) Participants: (list) Goal: (what drove this discovery)

Executive Summary

(2-3 sentences: what we discovered, how many services, key architectural decisions, top risks)

Bounded Contexts → Services

Context	Service	Key Responsibilities	Events Owned
(Context)	(Service)	(Responsibilities)	(Events)

Architecture Overview

(High-level description of how services interact -- orchestration style, key data flows, external integrations)

Technology Landscape

For each bounded context, catalog the technology stack and assess lifecycle status.

Context	Languages/Frameworks	Databases	Infrastructure	Lifecycle Status
(Context)	(e.g., Python 3.11, FastAPI)	(e.g., PostgreSQL 15)	(e.g., AWS ECS, RDS)	Current / Aging / End-of-life

Technology Consolidation Opportunities

Identify where multiple bounded contexts use different technologies for the same purpose. Consolidation reduces cognitive load, simplifies hiring, and lowers maintenance costs.

Capability	Technologies in Use	Contexts	Recommendation
(e.g., Message queue)	(e.g., RabbitMQ, SQS, Kafka)	(which contexts use each)	(consolidate to X / keep separate because Y)
(e.g., API framework)	(e.g., Express, FastAPI, Spring)	(which contexts use each)	(consolidate to X / keep separate because Y)

Feed consolidation opportunities into /technology-roadmap for investment planning and /build-vs-buy when evaluating replacement options.

Key Decisions Made

Decision	Rationale	Confidence	Revisit When
(Decision)	(Why)	High/Med/Low	(Trigger)

Top Risks

Risk	Impact	Mitigation	Owner
(Risk)	High/Med/Low	(Mitigation)	(Owner)

Backlog Summary

(N) total stories across (N) bounded contexts
Top 5 priorities: (list)
Spikes needed: (list)

Artifacts Produced

Artifact	Location	Format
Event Storming output	(location)	(format)
Boris model	(location)	(format)
SNAP sheets	(location)	(format)
Consolidated backlog	(location)	(format)

Step 8: Review

Ask the user:

Does the architecture feel right for the stated goal?
Are there services or boundaries that need another pass?
Is the backlog ready for team estimation, or are there too many unknowns?
Who needs to see this? (team, leadership, client)
Should we generate a presentation deck? → use /artium-deck

Uncertainty Policy

Topic	Tolerance	Action
Domain scope and boundaries	Low	STOP and ask -- wrong domain scope wastes the entire session
Business process being modeled	Low	STOP and ask -- Event Storming requires a clear process to trace
Stage transition readiness	Low	STOP and ask -- skipping quality checks compounds errors across stages
Bounded context names	Medium	Assume + flag [ASSUMED] -- refined during Boris modeling
Service interaction patterns (sync/async)	Medium	Assume + flag [ASSUMED] -- team validates during Boris
Story priority and sizing	Medium	Assume + flag [ASSUMED] -- backlog is a starting point
Participant roles and expertise	High	Best guess from context

Default: STOP and ask when a topic is not listed above.

Related skills

/event-storm -- Stage 1 (standalone)
/boris-model -- Stage 2 (standalone)
/snap-document -- Stage 3 (standalone)
/artium-deck -- generate a presentation from the discovery output
/story-write -- flesh out individual stories from the SNAP backlog
/prd-draft -- write PRDs for high-priority services

Output locations

Format	Location	Notes
Markdown summary	Conversation or saved file	Always produced.
Event Storming	Per `/event-storm` output settings	Stage 1 artifact.
Boris model	Per `/boris-model` output settings	Stage 2 artifact.
SNAP sheets	Per `/snap-document` output settings	Stage 3 artifact.
Consolidated backlog	Linear, Notion, or file	Per user preference.

Example Output

Input

Business domain: Claims processing platform at Meridian Health Partners, a regional insurance carrier handling 2.1M claims/year
Goal: Decompose a 12-year-old Java monolith ("ClaimCore") ahead of a cloud migration to AWS; engineering leadership needs to understand what's actually in the system before committing to a migration approach
Participants: 2 senior engineers who built ClaimCore, 1 claims operations lead, 1 product manager, 1 solutions architect (facilitating)
Current state: A 2019 data flow diagram (known to be outdated), ClaimCore source code, and tribal knowledge held by two engineers who are retirement-eligible
Output destinations: Miro (workshop artifacts), Confluence (SNAP sheets), Jira (backlog)

Output (abbreviated)

Architecture Discovery: Claims Processing Platform — Meridian Health Partners

Date: 2025-07-14 Participants: A. Nakamura (Sr. Engineer), D. Fontaine (Sr. Engineer), R. Osei (Claims Ops Lead), S. Patel (PM), L. Varga (Solutions Architect, facilitator) Goal: Decompose ClaimCore monolith to enable phased AWS cloud migration; surface service boundaries, data ownership, and integration risks before migration planning begins

Executive Summary

Discovery revealed 6 bounded contexts buried inside ClaimCore, ranging from well-isolated modules (Adjudication, Provider Directory) to deeply tangled shared-database integrations (Payment Disbursement, Member Eligibility). The system processes claims through a mix of synchronous DB calls and a legacy IBM MQ message bus that is undocumented in any existing artifact. The top architectural risk is a shared CLAIMS_MASTER table written to by 4 distinct subsystems with no clear ownership — this must be resolved before any service extraction begins.

Bounded Contexts → Services

Context	Service	Key Responsibilities	Events Owned
Intake & Triage	claims-intake-svc	Receive EDI 837 submissions, validate format, assign claim ID, route by type	ClaimReceived, ClaimRejected, ClaimRouted
Adjudication	adjudication-svc	Apply benefit rules, calculate allowed amounts, determine COB	ClaimAdjudicated, ClaimPended, ClaimDenied
Member Eligibility	eligibility-svc	Verify coverage at date of service, return eligibility response	EligibilityChecked, CoverageConflictFlagged
Provider Directory	provider-svc	Maintain provider network status, validate NPI, return contract rates	ProviderValidated, ContractRateFetched
Payment Disbursement	payment-svc	Generate EOB, trigger EFT/check to provider, handle reversals	PaymentInitiated, PaymentReversed, EOBGenerated
Appeals & Grievances	appeals-svc	Track appeal submissions, manage deadlines, link to original claim	AppealOpened, AppealResolved, DeadlineBreached

Architecture Overview

Claims enter via EDI batch (nightly) or a thin web portal (real-time). claims-intake-svc performs format validation synchronously, then publishes ClaimRouted onto IBM MQ. adjudication-svc consumes that event and calls eligibility-svc and provider-svc synchronously over internal JDBC calls today — these are prime candidates for REST or gRPC extraction. Once adjudicated, payment-svc is triggered via MQ. appeals-svc is the most isolated context and communicates exclusively through a shared Oracle schema — it has no published events today, only polling queries.

The current system is orchestration-heavy with a single ClaimProcessorBean acting as a God object coordinating all six contexts. Decomposition will require extracting this orchestration into either a dedicated workflow service (recommended: AWS Step Functions) or distributing it into choreography.

Technology Landscape

Context	Languages/Frameworks	Databases	Infrastructure	Lifecycle Status
Intake & Triage	Java 8, Spring MVC 4	Oracle 19c	On-prem JBoss EAP	Aging
Adjudication	Java 8, EJB 3	Oracle 19c (shared schema)	On-prem JBoss EAP	End-of-life
Member Eligibility	Java 8, EJB 3	Oracle 19c (shared schema)	On-prem JBoss EAP	End-of-life
Provider Directory	Java 11, Spring Boot 2.4	PostgreSQL 13	On-prem Tomcat	Current
Payment Disbursement	Java 8, EJB 3	Oracle 19c + IBM MQ 9	On-prem JBoss EAP	End-of-life
Appeals & Grievances	Java 8, JSF 2	Oracle 19c (shared schema)	On-prem JBoss EAP	End-of-life

Technology Consolidation Opportunities

Capability	Technologies in Use	Contexts	Recommendation
Application runtime	JBoss EAP, Tomcat, Spring Boot 2.4	All	Consolidate to Spring Boot 3.x on AWS ECS; Provider Directory is the reference implementation
Database	Oracle 19c (shared), PostgreSQL 13	All except Provider Directory	Migrate each extracted service to Aurora PostgreSQL; avoid lifting Oracle to cloud
Messaging	IBM MQ 9	Intake → Payment path	Replace with Amazon SQS/SNS during extraction; do not migrate IBM MQ to AWS
Java version	Java 8 (5 of 6 contexts), Java 11 (1)	All	Standardize on Java 21 LTS as part of each service extraction sprint

Feed consolidation opportunities into /technology-roadmap to sequence Oracle decommissioning and /build-vs-buy to evaluate rules engine options for Adjudication.

Key Decisions Made

Decision	Rationale	Confidence	Revisit When
Extract Provider Directory first	Already on Spring Boot + Postgres; lowest blast radius; proves extraction pattern	High	If Provider Directory has hidden Oracle dependencies found in code audit
Use AWS Step Functions for claim orchestration	Replaces God object pattern; keeps orchestration explicit and auditable for compliance	Medium	If latency requirements for real-time portal submissions make Step Functions too slow (< 2s SLA)
Do not lift IBM MQ to AWS	Licensing cost + operational overhead don't justify it; SQS covers the use case	High	Never — unless a vendor integration requires MQ specifically
Treat `CLAIMS_MASTER` as a migration blocker	4 writers, no ownership model; must be partitioned before any service goes live in AWS	High	After data ownership workshop resolves write boundaries

Top Risks

Risk	Impact	Mitigation	Owner
`CLAIMS_MASTER` shared-write ownership unresolved — 4 services write to overlapping columns with no transaction boundary	High	Spike: column-level ownership mapping; define bounded write contracts before extraction	A. Nakamura
D. Fontaine and A. Nakamura hold undocumented adjudication rule knowledge; both retirement-eligible	High	Pair each extraction sprint with a knowledge transfer session; document rules as executable tests	S. Patel
IBM MQ message schema undocumented — 3 message types found in code, 1 suspected but unconfirmed	Medium	Spike: MQ message audit before Intake extraction sprint; confirm with ops team	L. Varga
Appeals & Grievances has hard-coded CMS regulatory deadlines (45/60/90 day) in stored procedures	Medium	Extract deadline logic into a rules config layer before migrating; regression test against CMS audit logs	R. Osei
Real-time portal claims SLA (< 2s) may not be achievable with Step Functions cold starts	Medium	Prototype Step Functions Express Workflows under load; fallback is direct service-to-service choreography	L. Varga

Backlog Summary

47 total stories across 6 bounded contexts
Top 5 priorities:
1. CLAIMS_MASTER column ownership mapping spike (blocks all extraction)
2. Provider Directory

Run this now

Try /architecture-discovery on your own input

0/4000

Part of these Playbook topics

Product-Engineering Pairing

Related Engineering skills

ADR Generate AI Testing Strategy Architecture Context Reviewer Boris Model Build vs Buy Code Review Codependency Analyzer Debug Assist

Back to Skills Catalog