The Bridge Pattern — Avectic Engineering

Ask a large language model the same question twice. You may get two different answers. This is not a bug. It is a property of how probabilistic models work.

In most applications, this inconsistency is fine. A chat interface is improved by variation. A creative writing tool thrives on it. A search result with slight phrasing differences is harmless.

In consequential decisions, the same property is disqualifying. A prior authorization that evaluates to approved on Tuesday and denied on Wednesday cannot be defended to a payer, a regulator, or a court. A clinical trial eligibility determination that varies by run cannot be used to screen patients. An insurance underwriting decision that produces different risk scores on identical applications is a compliance failure.

The industry response to this has been largely unsatisfying. Some teams deploy LLMs for these decisions anyway and attach disclaimers, which transfers risk to the person on the receiving end without solving the underlying problem. Other teams avoid LLMs entirely and stay with overburdened human operators interpreting 40-page PDFs under time pressure, which has its own well-documented failure modes. Still others layer explainability tooling (SHAP values, attention visualizations) on top of probabilistic outputs and call it auditable, which is confusing at best and misleading at worst.

None of these approaches actually solves the problem. The problem is that consequential decisions require consistency, and probabilistic models do not provide consistency, and that is a structural mismatch, not an engineering gap that can be closed with a better prompt.

This post describes an architectural pattern, which we call the bridge pattern, that does solve the problem. It is the pattern we built our platform around, and I believe it is the pattern that any serious AI-in-consequential-decisions system eventually converges on. The specifics below describe our implementation. The pattern itself is general.

The problem in plainer terms

Every consequential decision pipeline has three stages: read, evaluate, and act. A prior authorization coordinator reads a clinical note, evaluates it against the payer's policy criteria, and generates a submission packet. A logistics dispatcher reads an exception alert, evaluates it against the customer's SLA and the carrier's options, and acts by approving a reroute or escalating. An insurance underwriter reads an application, evaluates it against underwriting guidelines, and acts by issuing or declining coverage.

Before LLMs, the read stage was the hard part. Most of the input data was unstructured or semi-structured text that required human interpretation to make sense of. Extracting the relevant facts from a ten-page operative note was, until recently, a task only a trained human could do reliably. This is why so much consequential decision-making has historically been done by people: the interpretation bottleneck was binding.

Language models dissolve the interpretation bottleneck. An LLM can read a clinical note, a shipment event log, an insurance application, or a contract clause, and produce a structured representation of the content with useful accuracy in seconds. This is a genuine capability advance. It is the reason AI is being pushed into so many decision workflows right now.

What LLMs do not dissolve, and cannot dissolve, is the evaluation bottleneck. Evaluating extracted facts against encoded rules must be consistent across runs, defensible to third parties, and reproducible for audit. Those properties are impossible to guarantee in a system that samples probabilistically, regardless of how much the industry tries to dress up probabilistic reasoning as auditable.

The bridge pattern is a recognition that the two stages have fundamentally different reliability requirements, and that the right engineering response is not to use a single tool for both but to engineer a clear architectural boundary between them.

The pattern

The bridge pattern separates a decision pipeline into two stages connected by a contract:

Stage one: interpretation

An LLM (or another interpretation mechanism) reads messy, unstructured, or semi-structured input and produces a structured intermediate representation. We call this a CaseIntent, but the name matters less than the property: it is a typed, validated data structure that downstream code can consume without further natural-language processing.

This stage tolerates probabilistic reasoning. Ambiguity is allowed. Multiple valid interpretations exist for most real-world inputs, and the system should represent that ambiguity rather than suppress it. Confidence scores are attached to each extracted field. Low-confidence extractions are flagged for human review.

Stage two: decision

A deterministic engine takes the structured intermediate representation and evaluates it against encoded rules. Same input produces same output, byte-identical, every run. The engine implements Boolean logic, threshold comparisons, temporal reasoning, and ontology matching, but all of this is pure computation with no probabilistic components in the decision path.

Each decision is traceable to the specific rule applied and the specific input field consulted. The output includes not just the decision but the entire rule trace, making every determination fully auditable.

The bridge between them

Between interpretation and decision sits a human operator checkpoint. The structured interpretation is presented to the operator for confirmation. Ambiguities are surfaced. Low-confidence extractions are flagged. The operator confirms or corrects the interpretation before the decision stage runs.

This is the key architectural move. The probabilistic stage is allowed to be uncertain, because uncertainty is handled by a human before it propagates. The deterministic stage is allowed to be rigid, because rigidity is appropriate when the input has been confirmed.

The structured intermediate representation is the contract between the two stages. It is the thing the LLM produces and the thing the deterministic engine consumes. Everything upstream of the contract can evolve, improve, switch models, or be replaced without touching the decision engine. Everything downstream of the contract can be reasoned about, tested, and audited without depending on the interpretation mechanism.

A concrete example

Consider a prior authorization workflow for spine surgery. The input is a collection of clinical documents: an MRI report, a surgeon's progress note, physical therapy records, medication history. The output is a prior authorization packet formatted for submission to the patient's insurance payer.

In a naive LLM-only implementation, the full pipeline runs as a single prompt. The model reads the documents, applies the payer's policy criteria, and generates the submission packet. This works, sort of, in demos. In production it fails for reasons that are already well understood. The model's interpretation of the policy criteria is inconsistent. The same patient record generates different conclusions across runs. The output cannot be defended to a payer who asks why a specific criterion was judged satisfied.

In a naive deterministic-only implementation, a developer writes rules that evaluate structured fields against thresholds. The fields have to come from somewhere. Without an LLM, they come from a human reading the documents and typing the values into a form. The system is auditable but slow and labor-intensive, which is why it does not scale and why this approach has not replaced the existing workflow.

In a bridge pattern implementation, the two halves cooperate. The LLM reads the documents and produces a structured intent:

@dataclass
class CaseIntent:
    region: str                  # "lumbar"
    procedure_types: list[str]   # ["fusion", "decompression"]
    approach: str                # "posterior"
    levels: list[str]            # ["L4-L5", "L5-S1"]
    levels_count: int            # 2
    prior_treatments: list[Treatment]
    imaging_findings: list[Finding]
    confidence: dict             # per-field confidence scores

This structure is produced probabilistically. The model reads the surgeon's note and the imaging report and produces its best interpretation of the fields. Each field has an associated confidence score. The system surfaces low-confidence fields to the coordinator for review.

The coordinator sees a confirmation screen showing the extracted intent, the source evidence for each field, and flags on anything the model was uncertain about. The coordinator confirms or corrects. The confirmed intent is passed to the decision engine.

The decision engine is a pure function. Given a CaseIntent and a PolicyRuleSet (the payer's encoded criteria), it produces a CodeAssistResult containing suggested CPT codes, NCCI compliance flags, and evidence gaps. No sampling. No randomness. Given the same intent and the same rule set, the output is byte-identical across runs.

def suggest_codes(intent: CaseIntent, rules: PolicyRuleSet) -> CodeAssistResult:
    # Pure deterministic logic.
    # No LLM calls. No sampling. No randomness.
    # Same input, same output, every time.
    codes = []
    for rule in rules.applicable(intent):
        result = rule.evaluate(intent)
        if result.status == SATISFIED:
            codes.extend(result.suggested_codes)
    return CodeAssistResult(
        codes=codes,
        compliance_flags=check_ncci(codes, intent),
        evidence_gaps=find_gaps(intent, rules),
        audit_trace=build_trace(intent, rules, codes)
    )

Every element of the result traces back to specific rules applied to specific fields in the intent, with specific evidence references to the source documents. A reviewer can ask why any code was suggested and get an exact answer. A re-run on the same confirmed intent produces the same codes every time.

Why determinism in the decision stage is the whole point

It is tempting, when implementing the bridge pattern, to smuggle probabilistic reasoning into the decision stage to handle edge cases the rule set does not cover well. This is the single biggest trap in this architecture. Resist it.

The value of the bridge pattern comes entirely from the decision stage being deterministic. The moment a probabilistic component enters the decision path, every property that made the architecture valuable collapses.

Determinism enables three things that matter:

Reproducibility. A decision made today can be re-run tomorrow, next month, or during an audit three years from now, and produce the same answer. This is non-negotiable for regulated decisions.

Testability. A rule engine can be tested exhaustively. Given a test case input, we can assert the expected output and verify it across 10,000 runs, knowing every run must produce the same result. Compare this to testing an LLM, where even with temperature zero the same input can produce subtly different outputs across model versions and can never be verified against a single expected output.

Defensibility. Every decision comes with a full rule trace. The evaluator sees exactly which rule was applied, which field was consulted, and which threshold was compared against. If the decision is challenged, the trace is the defense.

A probabilistic engine cannot provide any of these properties, regardless of how carefully it is prompted or how sophisticated its explainability tooling is. The industry's insistence on treating these as engineering problems that will be solved by more prompt engineering, better models, or better post-hoc explanation tools is a category error. These are not engineering problems. They are properties of the computation model. A decision engine that samples probabilistically cannot be deterministic, by definition.

The data structure that makes the boundary work

The bridge pattern only works if the structured intermediate representation is designed carefully. Get this wrong and the boundary leaks: either the interpretation stage makes decisions it shouldn't, or the decision stage has to do interpretation it shouldn't.

A well-designed intent structure has three properties:

It is typed

Every field has a specific type. Not "text," not "JSON," but specific enumerations, specific numeric ranges, specific structured sub-objects. Typing the intent forces the interpretation stage to produce values that are valid by construction, and enables the decision stage to evaluate without having to parse or interpret.

It carries confidence per field

The interpretation stage produces not just values but confidence scores for each value. This is the essential link between the probabilistic upstream and the deterministic downstream: confidence scores let the system surface ambiguity to the human operator without hiding it. The decision stage never sees confidence scores (or if it does, they flow through as metadata rather than as inputs to the decision logic). The human operator is the consumer of confidence.

It is complete

The intent captures everything needed to make the decision. Nothing further needs to be interpreted in the decision stage. If the decision engine needs a field that isn't in the intent, the boundary is broken and the intent structure needs to be extended.

In practice, intent design is iterative. The first version of the intent structure is always incomplete. New rule packs reveal missing fields. New input document types reveal interpretation challenges that the original structure could not express. The intent structure evolves, and evolves the interpretation prompts and decision rules along with it.

The key discipline is that when new fields are added, they are added to the intent structure first, implemented in the interpretation stage second, and consumed by the decision stage third. Never reverse this order. Fields that the decision stage reaches for but the interpretation stage does not provide create brittle dependencies that break at exactly the wrong time.

Why the confirmation step matters more than it looks

The human operator confirmation checkpoint is the feature that separates this architecture from a pure pipeline. It is also the feature that most engineering teams try to skip, because it seems to slow the workflow down.

Skipping it is a mistake. The confirmation step does three things that nothing else can do.

It resolves interpretation ambiguity using expert judgment. An LLM reading an operative note produces one interpretation, but the note may support several. A human coordinator who knows the surgeon's practices, the payer's quirks, and the patient's history is often the only available oracle for which interpretation is correct. The confirmation step surfaces the ambiguity and lets expert judgment resolve it, exactly once, with an audit record.

It creates an accountability boundary. Everything upstream of the confirmation is the AI's interpretation. Everything downstream is the human's confirmed interpretation. If the decision turns out to be wrong, the trace shows exactly where the error entered. Was it an interpretation failure (the AI produced the wrong intent) or a rule evaluation failure (the rules produced the wrong decision for a correct intent)? The confirmation boundary makes this distinction crisp, which is essential for improving both stages over time.

It preserves human agency in the decision. The operator is not rubber-stamping an AI recommendation. They are the decision maker. The AI prepares the input. The rules evaluate the confirmed input. But the judgment call at the boundary is made by a person, and that person's role is not to review a finished decision but to confirm the interpretation that the decision will be based on. This is a more respectful architecture than "AI decides, human reviews" and also a more useful one, because it puts human judgment at the stage where human judgment actually helps.

The misconceptions this pattern is not

The bridge pattern gets confused with several adjacent approaches that are superficially similar but actually different. It is worth being explicit about what it is not.

It is not "human in the loop"

Most human-in-the-loop systems put the human at the end of the pipeline, reviewing a completed AI recommendation before accepting or rejecting it. The bridge pattern puts the human in the middle, at the boundary between interpretation and decision. These are very different architectures with different failure modes. Human review at the end typically decays into rubber-stamping because the reviewer has no way to efficiently re-run the decision with a modified input. Human confirmation at the boundary is fundamentally constructive: the human modifies the input that feeds the decision, and the decision is computed from their modified input.

It is not "explainable AI"

Explainable AI typically means post-hoc explanation of a probabilistic model's output: SHAP values, attention visualizations, counterfactual analysis. These are interesting research tools but they do not actually produce reproducible, auditable decisions. A SHAP value explains what the model did but does not guarantee the model will do the same thing on the same input next time. The bridge pattern does not need explainability tooling because the decision stage is fully introspectable by construction. Every rule application is traceable, and the trace is the explanation.

It is not "guardrails"

Guardrails are filters on LLM outputs: does this output contain PII, does it violate policy, does it match a safety taxonomy. Guardrails are useful for what they do, but they operate on outputs that have already been produced probabilistically, and they do not make the underlying decision deterministic. A guardrail-protected LLM is still probabilistic. The bridge pattern differs in that the decision itself is produced deterministically, not filtered after the fact.

It is not a constitutional AI approach

Constitutional AI trains a model to follow a set of principles through reinforcement learning. The resulting model behaves more consistently with those principles but is still probabilistic at inference time. The bridge pattern does not rely on model training for consistency; it achieves consistency by removing the model from the decision path entirely.

Where the pattern fits and where it does not

The bridge pattern is not a universal answer. It is the right architecture for a specific class of problems and the wrong architecture for others. Being clear about the boundary helps.

Use the bridge pattern when:

The decision must be reproducible across runs
The decision must be auditable to a third party
The rules governing the decision can be encoded explicitly
The input is unstructured or semi-structured natural language
Wrong decisions carry meaningful consequences for identifiable people or organizations

Do not use the bridge pattern when:

The decision is low-stakes and reversibility is cheap (casual search, content recommendation, chat)
The rules are genuinely probabilistic at their core (pattern recognition in natural images, for example, where the ground truth itself is fuzzy)
The volume is high enough that human confirmation at every case is infeasible, and the consequences of any individual wrong decision are low

Most applied AI today lives in the "do not use" category. That is fine. Not every AI system needs to produce reproducible decisions. But a meaningful and growing subset, the subset that includes healthcare authorization, insurance adjudication, regulatory compliance, government benefits determination, and increasingly autonomous agent actions, lives squarely in the "use" category. For that subset, the bridge pattern is not a nice-to-have. It is the architecture that distinguishes a system that can be deployed responsibly from one that cannot.

Implementation notes from our system

A few specifics from how we built this, for engineers who want to see where the architectural ideas land in code.

Our interpretation stage is currently implemented with a structured prompt to a language model, followed by schema validation of the returned JSON against a Pydantic model. Fields that fail validation are flagged for human review rather than silently dropped or coerced. The confidence score per field is returned by the model directly, as part of its structured output, and is separately validated for reasonableness.

Our decision stage is a pure Python module with no network calls, no model inference, and no sampling. It operates on the validated intent and a versioned rule pack, and produces a result object that includes the decision, the rule trace, and the evidence references. Every function in the decision path is deterministic by inspection and unit-tested to enforce this.

Determinism is enforced as an invariant, not as a hope. We have regression tests that run the same intent through the decision engine 100 times and assert byte-identical output. We have encountered exactly one bug in which this invariant was violated (a particular rule path produced a dict with non-deterministic key ordering), and we treat violations of this invariant as the single highest-priority category of defect in our system. The reason determinism must be defended this aggressively is that it is the foundation everything else rests on. A decision engine that is deterministic 99.9% of the time is not deterministic. It is probabilistic with a small sample size.

Our confirmation interface shows the operator three panels: the source documents (with highlights on extraction spans), the generated intent structure (with confidence indicators), and the draft decision (with rule trace). The operator can edit the intent directly, which triggers a re-evaluation of the decision against the modified intent. The operator cannot edit the decision directly; edits must flow through the intent. This is a deliberate design choice that prevents the system from producing decisions that cannot be reproduced from an intent, which would break auditability.

What this pattern makes possible

The most interesting property of the bridge pattern is that it generalizes. We built our first implementation for spine surgery prior authorization. The interpretation stage reads clinical documents. The decision stage evaluates CPT coding rules and payer criteria. Every component of the decision engine is specific to healthcare.

But the architecture itself is not. We built a second implementation for logistics exception handling. The interpretation stage now reads carrier emails and event logs. The decision stage evaluates SLA terms and carrier rules. The engine code is different, but the shape is identical. Interpretation produces a structured intent. Human operator confirms. Deterministic engine evaluates. Audit trail emerges. Same architecture, different rule packs, different input types.

This generalization property is what makes the bridge pattern worth naming. It is not a trick for a specific domain. It is a pattern for any decision pipeline where consistent evaluation of interpretable inputs against explicit rules is required. Healthcare authorization is one instance. Logistics exceptions are another. Insurance underwriting, KYC review, regulatory compliance, government benefits, autonomous agent action gating, clinical trial screening. The list is long, and the architectures for these domains will, I believe, converge on this pattern whether they start with it or stumble into it.

If you are building any system where AI is entering decisions that matter, the question is not whether your architecture will eventually need a bridge pattern. The question is whether you will design it deliberately from the beginning or rediscover it painfully after shipping a probabilistic pipeline that cannot be defended in production.

Closing

The bridge pattern is not a breakthrough. It is a recognition. It recognizes that different stages of a decision pipeline have different reliability requirements, and that trying to meet those requirements with a single tool is the error underlying most of the confusion in the current AI industry around auditability, reliability, and responsible deployment.

The engineering implication is that the most important architectural decision in a consequential-decision AI system is where to draw the boundary between the probabilistic stage and the deterministic stage. Draw the boundary well, and the system becomes reliable, auditable, and improvable over time. Draw it poorly or not at all, and the system becomes a demo that does not survive contact with production.

Our team has been operating this architecture in production for about a year at the time of this writing. It is the thing that has made our system work. It is also, I believe, the thing that most teams building in this space will need to converge toward if they have not already. The sooner the pattern is named and discussed openly in engineering conversation, the sooner the field moves past the current false choice between "trust the model" and "do it by hand."

There is a third way. It is the bridge pattern. Build it deliberately.