BoundaryEnforcer demo | Tractatus AI Safety Framework

What this demo shows

BoundaryEnforcer is one component of the Tractatus framework. It is a runtime intercept that wraps the agent — not a behavioural constraint trained into the agent. Before any action belonging to a specified category is executed, the request passes through a checkpoint expressed in terms a human operator can evaluate. The checkpoint produces a three-state verdict, and the third state is load-bearing.

Below: the four action categories the primitive recognises, the three-state verdict, the shape of an enforcement call, and worked examples per category. For the architectural framing this primitive sits inside, see §3.3 The Tractatus Response and §3.5 Boundary-category fallibility on the Architectural Alignment paper, and §0(i) of the Aotearoa NZ Agentic AI Framework v1.2.

The four action categories

The router classifies a proposed action against these categories. Categories are fallible — the router is wrong sometimes; the architecture's posture toward that wrongness is recordability + reversibility + appeal, not "we got the categories right." The set is community-negotiated, not essence-of-thing.

Irreversible

Actions whose effect cannot be undone after execution — sent communications, financial transfers, public deletions, record finalisation, signed commitments.

Typical verdict: ESCALATE for novel cases, ALLOW for human-authorised routine, DENY if outside scope.

Values-laden

Actions whose correctness depends on contested value judgements rather than facts — content moderation, tone calibration, content acceptable to which audience, prioritisation of competing interests.

Typical verdict: ESCALATE — values choices belong to humans, not to autonomous systems.

Cultural-context-dependent

Actions where correctness varies by community, language, tikanga, or jurisdiction — translation choices, sacred-content handling, kinship-respectful messaging, te reo Māori macron placement.

Typical verdict: ESCALATE to a community-knowledgeable reviewer when context is unclear.

Unprecedented

Actions outside the agent's training distribution or operator's prior decisions — novel request patterns, untested code paths, unknown counter-party shapes, situations the operator has not yet ruled on.

Typical verdict: ESCALATE — operator's first ruling sets precedent for future similar cases.

The three-state verdict

The output is not binary. The third state — ESCALATE — is the architecturally load-bearing acknowledgement that a substantial fraction of significant decisions are not decidable at the time the agent encounters them.

ALLOW — the action falls inside operator-authorised scope (e.g., the operator has ruled previously on this category for this context; the precedent stands).
DENY — the action is outside scope (e.g., the operator has explicitly forbidden this category, or it crosses a HARD rule with no override).
ESCALATE — the action is values-laden, novel, or context-dependent in a way the agent should not resolve alone. Routed to a human for ruling. The ruling is recorded; subsequent similar cases inherit the precedent.

Rachel Garden's trinary-logic move ({True, False, Undecided}) is the formal parallel — three states are not a fallback shape but the right output shape for decisions that aren't binary at the time of the decision. Probabilistic logic at the far end is the right epistemic frame for the claim being made.

Shape of an enforcement call

The primitive's interface, in pseudocode. The agent never executes a boundary-class action directly; it submits the action proposal and waits on the verdict.

const verdict = await BoundaryEnforcer.evaluate({
  action:           'send_email',         // what the agent proposes
  category:         'irreversible',       // router classification
  payload:          { to, subject, body },// what would be done
  context: {
    tenantId:       tenant._id,
    actor:          agent.id,
    invokedBy:      'agentic_triage',     // call site
    prior_rulings:  await getPrecedents(tenant._id, 'send_email'),
  },
});

switch (verdict.state) {
  case 'ALLOW':    return execute(action, payload);
  case 'DENY':     return reject(verdict.reason);
  case 'ESCALATE': return queueForHuman(verdict.escalation_id);
}

Every verdict is written to a tenant-scoped audit record before the agent's process continues. The record carries the action, the classification, the verdict, the reasoning, and the human ruling if escalated. Records are append-only; appeal mechanisms operate against the record, not against the agent's memory.

Worked examples

Send an email on behalf of a member

Category: irreversible · Verdict: ESCALATE

External sends are not auto-dispatched by the agent. The proposed message is queued; the operator reviews and authorises the send. Subsequent identical-shape sends still escalate — each send is a separate decision because the recipient + content are new.

Apply a content-moderation decision to a community post

Category: values-laden · Verdict: ESCALATE

"Should this post be removed" is a values question, not a facts question. The agent surfaces the post + a classification (e.g., suspected-hate-speech) to the community moderator. The moderator rules. The rule becomes precedent for the community's future similar cases — not for other communities (cultural-context-dependent).

Translate a phrase containing te reo Māori macron

Category: cultural-context-dependent · Verdict: ESCALATE

Translation decisions affecting tikanga, macron usage, or sacred content are routed to a community-knowledgeable reviewer when the agent's confidence falls below a threshold. Routine, low-stakes translation may be ALLOWED inside an operator-approved scope.

Process a request from a counter-party with a shape the agent has not seen before

Category: unprecedented · Verdict: ESCALATE

First-of-class requests escalate by default. The operator's ruling on the first instance establishes whether the class becomes ALLOW or DENY for routine subsequent handling. The escalation cost on first-encounter is the architecture buying precedent at human-attention prices.

Run a database read query within the tenant's scope

Category: not boundary-class · Verdict: ALLOW (no boundary engagement)

Routine read operations within tenant scope are not boundary-class actions. The router classifies the action as outside the four categories above; no escalation is triggered. The audit record still captures the query (separate audit primitive), but BoundaryEnforcer does not gate it.

Attempt to write outside the tenant's scope

Category: structural HARD rule · Verdict: DENY

Tenant isolation is a HARD rule, not a category subject to escalation. Cross-tenant writes are DENIED structurally; there is no ESCALATE path to override. The audit record captures the attempt; downstream investigation is operator-class.

What the primitive does NOT do

It does not constrain the model's internal reasoning. The agent can think anything; the gate is between thought and action.
It does not claim provable safety. The boundary categories are fallible; the router is wrong sometimes; the architecture's response to that wrongness is recordability + reversibility + appeal, not a proof.
It does not work alone. The primitive is one component in a stack: cryptographically-signed sovereign records, bilateral federation with permissionless withdrawal, member-driven portability. Defence-in-depth across mechanisms that don't depend on agent compliance.
It does not survive a sufficiently capable agent that can synthesise consent across enough endpoints to defeat distributed verification. That is the open frontier; §7.5 of the Architectural Alignment paper names it as such.

BoundaryEnforcer demonstration