Skip to main content
Engineering/architecture-discovery

Architecture Discovery

You need a full Event Storming to Boris to SNAP architecture discovery workflow.

Use this when you need to go from "we don't understand this system" to "we have identified services, APIs, data ownership, and an actionable backlog." This skill set chains three stages: Event Storming (discover events and boundaries), Boris Modeling (model service relationships), and SNAP Documentation (capture detailed architecture per bounded context).

This is a skill set — it orchestrates /event-storm, /boris-model, and /snap-document in sequence. Each skill also works independently.

Process

Step 1: Gather inputs

Ask the user:

  1. Business domain or system — what are we discovering? (e.g., "order management," "patient records," "payments platform")
  2. Goal — what's driving this? (modernization, new team onboarding, monolith decomposition, greenfield design)
  3. Scope — full system or a specific subsystem/process?
  4. Participants — who will be involved across all three stages?
  5. Current state — any existing diagrams, docs, or architecture artifacts?
  6. Session format — in-person workshops or async/virtual documentation?
  7. Output destinations — where should final deliverables land? (Miro, Notion, Linear, FigJam, files)
  8. How deep? Options:
    • Full discovery (default) — all three stages, complete SNAP sheets, consolidated backlog
    • Events + Boris only — skip SNAP, stop after service relationships are mapped
    • Events only — just Event Storming, stop after domain events are captured

Step 2: Event Storming

Run /event-storm with the inputs gathered in Step 1.

Key outputs to carry forward:

  • Domain events (timeline)
  • Bounded contexts (with key events and ubiquitous language)
  • Thin slices (happy path + alternates)
  • Hot spots and pain points
  • Communication patterns (sync vs. async)

Step 3: Bridge — Event Storming to Boris

Before moving to Boris, confirm with the user:

  • "We identified (N) bounded contexts: (list). Do these feel right?"
  • "The happy path thin slice is: (path). Should we start Boris with this flow?"
  • "There are (N) hot spots. Any that would block Boris modeling?"

Reshape the Event Storming output for Boris:

  1. Each bounded context becomes a service node
  2. Each communication pattern becomes a candidate interaction arrow
  3. Each thin slice becomes a flow to walk through
  4. Hot spots carry forward as open questions

Step 4: Boris Modeling

Run /boris-model using the reshaped inputs from Step 3.

Key outputs to carry forward:

  • Services (with responsibilities and event ownership)
  • Service interactions (sync/async, labeled)
  • Flows traced through the architecture
  • API candidates
  • Patterns identified (orchestration, choreography, shared data)
  • Open questions

Step 5: Bridge — Boris to SNAP

Before moving to SNAP, confirm with the user:

  • "We identified (N) services with (N) interactions. Ready to document details?"
  • "The API candidates are: (list). Do these cover the key integration points?"
  • "Open questions from Boris: (list). Any to resolve before SNAP?"

Reshape the Boris output for SNAP:

  1. Each service becomes a SNAP sheet
  2. Each API candidate seeds the APIs category
  3. Each event (produced/consumed) seeds the Pub/Sub category
  4. Each interaction seeds the External Systems category
  5. Open questions become Risks and Stories (investigation spikes)

Step 6: SNAP Documentation

Run /snap-document using the reshaped inputs from Step 5.

Key final outputs:

  • Complete SNAP sheets per bounded context
  • Gap analysis (missing APIs, data ownership conflicts, orphan events)
  • Consolidated backlog with prioritization

Step 7: Synthesize final deliverables

Produce a single architecture discovery summary:


Architecture Discovery: (Domain/System Name)

Date: (date) Participants: (list) Goal: (what drove this discovery)

Executive Summary

(2-3 sentences: what we discovered, how many services, key architectural decisions, top risks)

Bounded Contexts → Services

ContextServiceKey ResponsibilitiesEvents Owned
(Context)(Service)(Responsibilities)(Events)

Architecture Overview

(High-level description of how services interact — orchestration style, key data flows, external integrations)

Technology Landscape

For each bounded context, catalog the technology stack and assess lifecycle status.

ContextLanguages/FrameworksDatabasesInfrastructureLifecycle Status
(Context)(e.g., Python 3.11, FastAPI)(e.g., PostgreSQL 15)(e.g., AWS ECS, RDS)Current / Aging / End-of-life

Technology Consolidation Opportunities

Identify where multiple bounded contexts use different technologies for the same purpose. Consolidation reduces cognitive load, simplifies hiring, and lowers maintenance costs.

CapabilityTechnologies in UseContextsRecommendation
(e.g., Message queue)(e.g., RabbitMQ, SQS, Kafka)(which contexts use each)(consolidate to X / keep separate because Y)
(e.g., API framework)(e.g., Express, FastAPI, Spring)(which contexts use each)(consolidate to X / keep separate because Y)

Feed consolidation opportunities into /technology-roadmap for investment planning and /build-vs-buy when evaluating replacement options.

Key Decisions Made

DecisionRationaleConfidenceRevisit When
(Decision)(Why)High/Med/Low(Trigger)

Top Risks

RiskImpactMitigationOwner
(Risk)High/Med/Low(Mitigation)(Owner)

Backlog Summary

  • (N) total stories across (N) bounded contexts
  • Top 5 priorities: (list)
  • Spikes needed: (list)

Artifacts Produced

ArtifactLocationFormat
Event Storming output(location)(format)
Boris model(location)(format)
SNAP sheets(location)(format)
Consolidated backlog(location)(format)

Step 8: Review

Ask the user:

  • Does the architecture feel right for the stated goal?
  • Are there services or boundaries that need another pass?
  • Is the backlog ready for team estimation, or are there too many unknowns?
  • Who needs to see this? (team, leadership, client)
  • Should we generate a presentation deck? → use /artium-deck

Uncertainty Policy

TopicToleranceAction
Domain scope and boundariesLowSTOP and ask — wrong domain scope wastes the entire session
Business process being modeledLowSTOP and ask — Event Storming requires a clear process to trace
Stage transition readinessLowSTOP and ask — skipping quality checks compounds errors across stages
Bounded context namesMediumAssume + flag [ASSUMED] — refined during Boris modeling
Service interaction patterns (sync/async)MediumAssume + flag [ASSUMED] — team validates during Boris
Story priority and sizingMediumAssume + flag [ASSUMED] — backlog is a starting point
Participant roles and expertiseHighBest guess from context

Default: STOP and ask when a topic is not listed above.

Related skills

  • /event-storm — Stage 1 (standalone)
  • /boris-model — Stage 2 (standalone)
  • /snap-document — Stage 3 (standalone)
  • /artium-deck — generate a presentation from the discovery output
  • /story-write — flesh out individual stories from the SNAP backlog
  • /prd-draft — write PRDs for high-priority services

Output locations

FormatLocationNotes
Markdown summaryConversation or saved fileAlways produced.
Event StormingPer /event-storm output settingsStage 1 artifact.
Boris modelPer /boris-model output settingsStage 2 artifact.
SNAP sheetsPer /snap-document output settingsStage 3 artifact.
Consolidated backlogLinear, Notion, or filePer user preference.

Example Output

Input

  • Business domain: Claims processing platform at Meridian Health Partners, a regional insurance carrier handling 2.1M claims/year
  • Goal: Decompose a 12-year-old Java monolith ("ClaimCore") ahead of a cloud migration to AWS; engineering leadership needs to understand what's actually in the system before committing to a migration approach
  • Participants: 2 senior engineers who built ClaimCore, 1 claims operations lead, 1 product manager, 1 solutions architect (facilitating)
  • Current state: A 2019 data flow diagram (known to be outdated), ClaimCore source code, and tribal knowledge held by two engineers who are retirement-eligible
  • Output destinations: Miro (workshop artifacts), Confluence (SNAP sheets), Jira (backlog)

Output (abbreviated)

Architecture Discovery: Claims Processing Platform — Meridian Health Partners

Date: 2025-07-14 Participants: A. Nakamura (Sr. Engineer), D. Fontaine (Sr. Engineer), R. Osei (Claims Ops Lead), S. Patel (PM), L. Varga (Solutions Architect, facilitator) Goal: Decompose ClaimCore monolith to enable phased AWS cloud migration; surface service boundaries, data ownership, and integration risks before migration planning begins


Executive Summary

Discovery revealed 6 bounded contexts buried inside ClaimCore, ranging from well-isolated modules (Adjudication, Provider Directory) to deeply tangled shared-database integrations (Payment Disbursement, Member Eligibility). The system processes claims through a mix of synchronous DB calls and a legacy IBM MQ message bus that is undocumented in any existing artifact. The top architectural risk is a shared CLAIMS_MASTER table written to by 4 distinct subsystems with no clear ownership — this must be resolved before any service extraction begins.


Bounded Contexts → Services

ContextServiceKey ResponsibilitiesEvents Owned
Intake & Triageclaims-intake-svcReceive EDI 837 submissions, validate format, assign claim ID, route by typeClaimReceived, ClaimRejected, ClaimRouted
Adjudicationadjudication-svcApply benefit rules, calculate allowed amounts, determine COBClaimAdjudicated, ClaimPended, ClaimDenied
Member Eligibilityeligibility-svcVerify coverage at date of service, return eligibility responseEligibilityChecked, CoverageConflictFlagged
Provider Directoryprovider-svcMaintain provider network status, validate NPI, return contract ratesProviderValidated, ContractRateFetched
Payment Disbursementpayment-svcGenerate EOB, trigger EFT/check to provider, handle reversalsPaymentInitiated, PaymentReversed, EOBGenerated
Appeals & Grievancesappeals-svcTrack appeal submissions, manage deadlines, link to original claimAppealOpened, AppealResolved, DeadlineBreached

Architecture Overview

Claims enter via EDI batch (nightly) or a thin web portal (real-time). claims-intake-svc performs format validation synchronously, then publishes ClaimRouted onto IBM MQ. adjudication-svc consumes that event and calls eligibility-svc and provider-svc synchronously over internal JDBC calls today — these are prime candidates for REST or gRPC extraction. Once adjudicated, payment-svc is triggered via MQ. appeals-svc is the most isolated context and communicates exclusively through a shared Oracle schema — it has no published events today, only polling queries.

The current system is orchestration-heavy with a single ClaimProcessorBean acting as a God object coordinating all six contexts. Decomposition will require extracting this orchestration into either a dedicated workflow service (recommended: AWS Step Functions) or distributing it into choreography.


Technology Landscape

ContextLanguages/FrameworksDatabasesInfrastructureLifecycle Status
Intake & TriageJava 8, Spring MVC 4Oracle 19cOn-prem JBoss EAPAging
AdjudicationJava 8, EJB 3Oracle 19c (shared schema)On-prem JBoss EAPEnd-of-life
Member EligibilityJava 8, EJB 3Oracle 19c (shared schema)On-prem JBoss EAPEnd-of-life
Provider DirectoryJava 11, Spring Boot 2.4PostgreSQL 13On-prem TomcatCurrent
Payment DisbursementJava 8, EJB 3Oracle 19c + IBM MQ 9On-prem JBoss EAPEnd-of-life
Appeals & GrievancesJava 8, JSF 2Oracle 19c (shared schema)On-prem JBoss EAPEnd-of-life

Technology Consolidation Opportunities

CapabilityTechnologies in UseContextsRecommendation
Application runtimeJBoss EAP, Tomcat, Spring Boot 2.4AllConsolidate to Spring Boot 3.x on AWS ECS; Provider Directory is the reference implementation
DatabaseOracle 19c (shared), PostgreSQL 13All except Provider DirectoryMigrate each extracted service to Aurora PostgreSQL; avoid lifting Oracle to cloud
MessagingIBM MQ 9Intake → Payment pathReplace with Amazon SQS/SNS during extraction; do not migrate IBM MQ to AWS
Java versionJava 8 (5 of 6 contexts), Java 11 (1)AllStandardize on Java 21 LTS as part of each service extraction sprint

Feed consolidation opportunities into /technology-roadmap to sequence Oracle decommissioning and /build-vs-buy to evaluate rules engine options for Adjudication.


Key Decisions Made

DecisionRationaleConfidenceRevisit When
Extract Provider Directory firstAlready on Spring Boot + Postgres; lowest blast radius; proves extraction patternHighIf Provider Directory has hidden Oracle dependencies found in code audit
Use AWS Step Functions for claim orchestrationReplaces God object pattern; keeps orchestration explicit and auditable for complianceMediumIf latency requirements for real-time portal submissions make Step Functions too slow (< 2s SLA)
Do not lift IBM MQ to AWSLicensing cost + operational overhead don't justify it; SQS covers the use caseHighNever — unless a vendor integration requires MQ specifically
Treat CLAIMS_MASTER as a migration blocker4 writers, no ownership model; must be partitioned before any service goes live in AWSHighAfter data ownership workshop resolves write boundaries

Top Risks

RiskImpactMitigationOwner
CLAIMS_MASTER shared-write ownership unresolved — 4 services write to overlapping columns with no transaction boundaryHighSpike: column-level ownership mapping; define bounded write contracts before extractionA. Nakamura
D. Fontaine and A. Nakamura hold undocumented adjudication rule knowledge; both retirement-eligibleHighPair each extraction sprint with a knowledge transfer session; document rules as executable testsS. Patel
IBM MQ message schema undocumented — 3 message types found in code, 1 suspected but unconfirmedMediumSpike: MQ message audit before Intake extraction sprint; confirm with ops teamL. Varga
Appeals & Grievances has hard-coded CMS regulatory deadlines (45/60/90 day) in stored proceduresMediumExtract deadline logic into a rules config layer before migrating; regression test against CMS audit logsR. Osei
Real-time portal claims SLA (< 2s) may not be achievable with Step Functions cold startsMediumPrototype Step Functions Express Workflows under load; fallback is direct service-to-service choreographyL. Varga

Backlog Summary

  • 47 total stories across 6 bounded contexts
  • Top 5 priorities:
    1. CLAIMS_MASTER column ownership mapping spike (blocks all extraction)
    2. Provider Directory